Assessing the quality of an evaluation is an integral and fundamental part of the evaluation process. Indeed an evaluation that does not meet some minimum quality standards can very well mislead decision-makers and programme managers.
However, to assess evaluation quality is a complex and difficult process. The evaluations performed in the context of socio-economic development programmes and policies are too different from each other to allow the existence of few simple rules that can guarantee the quality across the board.
By and large one can say that the quality of the evaluation as a whole is conditional upon the presence of three distinct but interrelated aspects:
- the quality of the planning and design phase, including the commissioning of the evaluation;
- the quality of the implementation of the evaluation itself;
- the quality of the monitoring system and of the available data.
These aspects are interrelated in the sense that poor performance by the evaluator can very well stem from the poor quality of the data or from the flaws of the planning and design phase. Unfortunately those involved in these three sets of activities are different and often their goals, as well as their quality criteria, are also different. For instance the monitoring system designed for the day to day management of the programme does not necessarily produce the data needed for an evaluation of impacts.
Furthermore these aspects can be seen from two different points of view.
In the first place, quality can be considered a characteristic of the process through which the evaluation activities are performed. The assessment of quality could include: the way in which the commissioning authority develops the decision to proceed to an evaluation, defines its scope and the resources available. This can be analysed in order to understand if the procedures followed were appropriate to the allocation of the different responsibilities, if the contribution of the various stakeholders was taken into consideration, etc.. The same goes for the performance of the evaluation. One can focus on the way in which the team, and its interaction with the commissioner and the evaluators, was managed, the checks that were put in place in order to ensure that the data collected were properly treated, etc.. The organisation of the monitoring process can be assessed as well.
In the second place, quality is a characteristic of the products of the evaluation process. Thus one could analyse the ToR according to the criteria that we have already spelled out. Of course, one can assess the quality of the intermediate and final evaluation reports to see whether they meet some basic criteria of good professional practice and if the data are sufficient in quantity and reliable enough to warrant sound judgements.
In theory the two aspects the process and the product are linked: a good process should generate a good product and the reverse is also true, in the sense that a good product should be the result of a good enough production process.
The MEANS Collection (1999) noted:
There is no system of professional certification, anywhere in the world, which institutionalises rules and quality criteria. Somewhat disparate grids of criteria are proposed, based on the evaluation models elaborated by various authors, but no consensus exists in this domain. Moreover, the nature of the criteria mentioned does not always appear clearly when an attempt is made to put them into practice.
Since then, however, some things have improved. In particular, as Box Standards Guidelines and Ethical Codes shows, it is now becoming common to define good practice standards in evaluation. These have been elaborated by international bodies (such as the OECD), National Administrations (for example, the Italian Department for Economics and Finance) or professional associations such as national evaluation societies and associations. Many of these follow on from earlier efforts in the United States and can be traced back to American Evaluation Association (AEA): Guiding Principles for Evaluators (1992) and the Joint Committee on Standards for Educational Evaluation. Program Evaluation Standards (1994).
Box Standards Guidelines and Ethical Codes provides a cross section of some current evaluation standards and codes. They fall into a number of categories. Most, in particular those that derive from the AEA Joint Standards such as the German Evaluation Society's (DeGEval) and the African Evaluation Guidelines are directed primarily at the technical conduct of evaluation by evaluators, e.g., they concern how data is gathered and how conclusions are presented. (The distinction between guidelines and the more stringent and ambitious standards is also instructive.) Another category, of which Canadian and Australasian and to some extent the UK Evaluation Society's outputs are examples, is more concerned with ethical codes of practice than technical practice of evaluation. But again this mainly concerns the ethics of evaluators than of other implicated actors. Most recently a new category of guideline has emerged. This is directed more at administrations and those who commission evaluations than at evaluators. Examples of this can be found in the OECD (PUMA and DAC guidelines) and most recently in the European Commission.
Despite this growing array of guidelines, standards and codes that concern quality in evaluation there is not at present a common statement that has universal recognition.
Although there is not yet consensus about all the components of a quality assurance system for evaluation, we have begun to see a shift from a focus largely on quality control, i.e., ways of judging report/ output quality. This shift was endorsed by a recent study on the use of evaluation by the European Commission ( Box Quotation from EU research on use of evaluation ).
Box Quality control and quality assurance criteria identifies both quality control and quality assurance criteria. Both are needed as a means of judging evaluation reports and outputs. Normally the person responsible for managing the evaluation within the commissioning body would take responsibility for applying the quality control criteria. Ideally performance on the quality assurance criteria needs to be informed by the views of members of the Steering Committee, other stakeholders, the evaluation team and those responsible for managing the evaluation on behalf of the commissioning body. The Steering Committee should provide the criteria as early as possible in the evaluation assignment and is normally best placed to make the overall assessment at the completion of the work. However, for quality assurance that rests on process criteria, consultation with other stakeholders not necessarily represented on a steering committee will be necessary. For quality control purposes, consultation with external experts or referees can be useful. It needs to be emphasised that the application of quality control / content-type criteria and quality assurance / process-type criteria are undertaken for different purposes. Quality control of report content offers some assurance that the work has been properly conducted and that its conclusions can be relied on. Quality assurance of the evaluation process will contribute more to learning about evaluation management and provide inputs that should improve future evaluation management. The quality control and quality assurance criteria are elaborated in Box Quality control and quality assurance criteria .
Who should be responsible for a quality control and quality assurance procedure will vary with the institutional context. In national sectoral programmes, this may be a central government responsibility and in local development programmes, the responsibility may rest with local actors. The methods of application will be similarly varied - sometimes a grid may be filled out by key individuals and aggregated, but on other occasions a workshop or consensus conference may ensure the most balanced judgements.
See also quality control
.
Quality control - output criteria
Meeting needs
Has the evaluation answered the questions included in the ToR satisfactorily and does the report provide additional information that might be essential for the commissioners? In particular:
- Has the way programme or intervention objectives evolved and been interpreted been analysed?
- Does the report cover the entire programme? If not, is the selection justified as regards the priorities stated by the commissioners in the ToR and subsequently?
- Does the evaluation provide useful feedback for programme managers?
- Does it include lessons on successes and failures that may be of interest to other programmes, regions or countries?
For ex post evaluations it is important to check whether the evaluation has managed to reach a reasonable compromise between the following two contradictory requirements: rapidly obtaining information for feeding into the new programme cycle and not drawing hasty conclusions before all the impacts have been observed.
Relevant scope
In order to check the relevance of the scope of an evaluation, it is necessary first to check whether the essential characteristics of the programme or intervention have been well described and whether the problems and successes in implementation have been properly clarified.
Secondly, because the results and impacts have to be analysed in order to judge the extent to which objectives have been achieved, it is necessary to check whether they have been included in the evaluation. It is also necessary to check whether the evaluation has overlooked other potential or future results or impacts, as well as any unexpected yet significant effects and results that may exist.
Finally, the scope of an evaluation depends on the programme or intervention target that can be defined in terms of eligible geographical areas or non-localised target groups (e.g., the long-term unemployed). It is therefore necessary to check whether:
- the limits of the scope, in terms of areas or groups, are defined according to the logic of the intervention;
- the scope includes peripheral areas or non-eligible groups which are nevertheless likely to be affected by the evaluated interventions;
- lastly, if the evaluation considers the evaluated programme or intervention in isolation or includes its interactions with other European or national programmes.
Defensible design
This criterion relates to the technical qualities of the evaluation. Methodological choices must be derived from the evaluative questions. The evaluation must, moreover, make the best possible use of existing research and analyses. Three types of question have to be asked:
- Has the relevant knowledge been collected and used wisely?
- Are the construction of the method and the choice of tools really justified for answering the evaluative questions properly?
- Were the reference situations chosen (counterfactual or similar) appropriate for making valid comparisons?
Any evaluation report must include a description of the method used and clearly define the sources of data. Similarly, the limits of the method and the tools used must be clearly described. It is necessary to check whether:
- the method is described in enough detail for the quality to be judged;
- the validity of data collected and tools used is clearly indicated;
- the available data correspond to the tools used.
Because a causal analysis of effects is the most important question in ex post evaluations, the method used to analyse these causal relations is the priority in this type of evaluation. It is necessary to check whether the evaluation adequately analyses relations of cause and effect for the most essential questions.
Reliable data
Evaluators use existing data (secondary data) from the monitoring system and from other sources of information, or else primary data that they have collected for the evaluation. In the latter case, the methods used to collect and process the data (choice and application of the tools used for this purpose) are very important factors in the reliability and validity of the results.
In order to assess the reliability of the data used, it is necessary to examine whether:
- available sources of information have been identified and the reliability of this data has been checked;
- sources of information taken from the monitoring system and previous studies have been used optimally;
- the techniques used to collect the chosen data were complete and suitable for answering the evaluative questions.
Whether the collection of data used quantitative or qualitative techniques or a combination of both, it is necessary to inquire if:
- the mixture of qualitative and quantitative data is appropriate for a valid analysis of the phenomenon;
- the "populations" used for data collection have been correctly defined;
- the survey samples or cases studied have been selected in relation to established criteria;
- the main data collection techniques have been implemented with appropriate tools and in such a way as to guarantee an adequate degree of reliability and validity of the results.
Sound analysis
Quantitative analysis consists of the systematic analysis of using data and other statistical techniques. It has a particular focus on numerical values. Qualitative analysis consists of the systematic comparison and interpretation of information sources in the form of cross-referencing with a particular focus on why things happen. In both cases it is necessary to assess whether the methods of analysis used are relevant as regards the type of data collected and whether the analysis has been carried out to an appropriate quality.
In the case of socio economic development relations of cause and effect are complex and therefore constitute a particular challenge for evaluation. It is necessary to check:
- whether the relations of cause and effect underlying the programme are sufficiently explicit and relevant so that the object of analysis can be focused, and
- to what extent the analysis uses suitable techniques.
For this reason, a comparison between beneficiaries and a control group or at least a before-after comparison, is recommended.
Credible results
The credibility of results is defined here as that they follow logically and are justified by the analysis of data and interpretations based on carefully presented explanatory hypotheses. The validity of the results must be satisfactory. This means that the balance between internal validity (absence of technical bias in the collection and processing of data) and external validity (representativeness of results) must be justifiable. It is also necessary to check whether the results of the analysis were produced in a balanced and reliable way.
The need to perform in-depth analyses of a part of the programme poses the problem of extrapolation, from case studies, for the programme as a whole. In this context, it is necessary to check that:
- the interpretative hypotheses and extrapolations are justifiable and the limits of validity have been defined;
- the selection of cases and samples makes it possible to generalise the findings.
Impartial conclusions
Conclusions include suggestions and sometimes recommendations that are more than results. Whereas results are "technical" and can be analysed without too much risk of impartiality, conclusions and, a fortiori, recommendations are issued on the basis of value judgements. The quality of the judgement is thus decisive.
To answer the question: Are the conclusions fair, free of personal or partisan considerations and detailed enough to be implemented concretely, it is necessary to check that:
- the elements on which the conclusions are based are clear;
- the conclusions and recommendations are operational and sufficiently explicit to be implemented;
- controversial questions are presented in a fair and balanced way.
Key questions such as relevance, effectiveness and efficiency must be addressed within the framework of an evaluation and must therefore be answered appropriately. The evaluation report must also show the appropriateness of the budget for the programme of intervention.
Essential questions such as the value added of the programme or intervention and progress made in terms of transversal goals like cohesion, subsidiarity, good governance, sustainable development and equal opportunities need to be studied. In the case of ex ante evaluations of programmes, conclusions need to be formulated so as to feed into the process of negotiation on the evaluated programme. The report should make it possible to improve the evaluability of the programme or intervention.
Clear report
Evaluation results can be disseminated and communicated to the stakeholders in writing or verbally. The final report is only one means of diffusion and continual communication of results is desirable. The clarity of the report will depend on the quality of the presentation of results and the limits of the work performed. It is necessary to check that:
- the report was written clearly and is set out logically;
- specialised concepts are used only when absolutely necessary and they are clearly defined;
- presentation, tables and graphs enhance the legibility and intelligibility of the report; and
- the limits of the evaluation, in terms of scope, methods and conclusions, are clearly shown.
In many cases only the summary of a report is read. It is therefore essential for this summary to be clear and concise. It must present the main conclusions and recommendations in a balanced and impartial manner. It must be easy to read without the need to refer to the rest of the report.
Quality assurance criteria
The next set of criteria concerns the overall process and context of the evaluation: quality assurance rather than quality control. It will allow those assessing quality both to understand what might account for positive and negative aspects of the evaluation outputs and draw lessons that could be applied in order to improve the quality of future evaluations.
Coherent and evaluable objectives
The coherence of the objectives: the extent to which they are specific, linked to interventions, not contradictory, etc., has been discussed earlier. It was noted that the use of logic models, programme theory and theory of change approaches are useful ways to clarify programme objectives and the logic of interventions at the early stages of a programme prior to the launch of an evaluation. At this stage we are interested in the outcomes of this earlier process. How far were the evaluators dealing with a coherent programme or intervention in terms of objectives? Were any evaluation difficulties the result of poorly articulated objectives or other problems of evaluability?
Well drawn terms of reference
Sound terms of reference make for effective evaluations. To an extent it is possible at the time they are drafted to judge the adequacy of a ToR. It also becomes easier with hindsight to identify what might have usefully been included. This is important for future learning, i.e., how to improve ToRs in the future.
A poor or incomplete ToR can lead evaluators to deploy their resources inappropriately. It can also lead to other negative effects. One common consequence is when gaps in the ToR become evident in the course of an evaluation and the commissioner struggles to redirect the evaluation mid-way or to request additional outputs that were not planned for or budgeted.
Sound tender selection process
Was the tender selection process well conducted? This is both a procedural question and a matter of substance. Procedurally an assessment should be made of the systematic application of relevant criteria at selection. Substantively we are interested in whether the right decision was made. For example, was a decision taken to favour a well-known firm but the time commitment of key personnel was inadequate? Was the method too loosely specified? Or was an experimental high-risk method favoured and could this account for problems encountered later?
Effective dialogue and feedback throughout evaluation process
Keeping an evaluation on track, providing feedback and providing a forum for stakeholders to learn through dialogue with each other and with the evaluators is a recognised prerequisite for quality in evaluation. This is partly a question of the forum created for this purpose. Most obviously a Steering Committee but possibly also specific briefing meetings and workshops, e.g., briefing workshops for local politicians and policy makers. The inclusiveness of the membership of such meeting places needs to be assessed: were all the right stakeholders and publics involved?
The purpose of these opportunities for briefing and exchange is the dialogue and feedback that they enable. Was good use made of Steering Committee meetings? Were the agendas appropriate? Did stakeholders see these opportunities as productive and enhancing their understandings? Did they ultimately help shape and improve the quality and usefulness of the evaluation?
Adequate information resources available
Evaluators need information. Part 4 of this GUIDE emphasises the importance of data availability and monitoring systems. Without adequate information resources it is difficult for evaluators to do good work. An assessment therefore needs to be made of the adequacy of information. Most obviously this concerns monitoring information and systems. Often monitoring systems emphasise the needs of external sponsors and funders. They also need to be able to help programme managers and an evaluation usually reveals the extent to which they do. Evaluators will also need to draw on secondary administrative data, gathered often for other purposes by local, regional and national administrations.
Much information in an evaluation is held in the minds of key informants. This is especially so for contextual and qualitative information which is important not only to understand the programme but also how to interpret more formal data.
Overall, in order to judge the quality of the process and context of the evaluation there needs to be an assessment first of whether information existed and second whether it was made available. For example, in some programmes there may be data available such as administrative returns on local employment or the minutes of management committees of particular projects or sub-programmes but these are difficult to access. It may also be that the key informant refuses to provide evaluators with information perhaps because of poor relations between the involved stakeholders and administrations. To that extent, judgements about the availability of information and data to evaluators can provide data about the actual state of partnership and inter-agency cooperation.
Good management and co-ordination by evaluation team
However well planned and however convincing the workplan and inception report, all evaluations need to be executed properly. They need both to follow plans and be able to adapt to unexpected events that make plans - or aspects of them - redundant. Teams need to be kept together and the different work components need to be co-ordinated and their outputs integrated. Relations with commissioners of evaluation, programme managers and a whole variety of informants, fieldsites, implicated institutions, groups and associations have to be managed.
These aspects of management are mainly the responsibility of the evaluation team and its managers. However there are also elements that are shared with programme managers and those who are responsible for commissioning the evaluation. For example, how the commissioning system responds to requests to adapt a previously made workplan is not in the control of the evaluation team alone.
Effective dissemination of reports/outputs to Steering Committee and policy/programme managers
Report dissemination is another shared responsibility. In part it depends on the ability of the evaluation team to produce high quality and well-drafted outputs. (This is covered in terms of quality control above.) It also requires an awareness of the value and opportunities for dissemination within the evaluation team. There is for example a big difference between evaluators who confine their feedback to the contractual minimum and those who see it as their responsibility to provide ad hoc feedback when new problems occur or when key issues need to be resolved.
This kind of dissemination also requires sensitivity to the information needs and interests of key stakeholders. Sometimes outputs need to be tailored to meet quite different interests. For example programme managers will have a different perspective from local SMEs even though they will also share certain interests in common.
Effective dissemination to stakeholders
Reports and outputs need to be disseminated if they are to facilitate learning. An evaluation process should not be considered complete until a programme of dissemination has taken place. The general requirements for such dissemination should have been signalled in the ToR. However, primary responsibility does not rest with evaluators. Programme managers and those who commission evaluations should take responsibility for dissemination to stakeholders, including the public at large.
The synthetic assessment
The synthetic assessment recapitulates all the above quality criteria. It is difficult to recommend any particular weighting for the different criteria because their importance varies from one situation to the next.
Box Quality control and quality assurance criteria indicates a grid for the quality control of the evaluation report. Box Grid for an assessment of the quality of the evaluation process provides a quality assurance grid.
In both cases a five point rating scale is used. This runs from the positive (where very positive indicates the end point) to the negative (where very negative indicates the end point). Thus there are two positive possibilities and two negative possibilities and a mid-point when the balance of judgement is uncertain.