Structuring risks and solutions in the use of big data sources for producing official statistics – Analysis based on a risk and quality framework

Document date: 
Friday, 1 May, 2015
Wirthmann A, Karlberg, M., Kovachev B., Reis F

An increasing number of statistical offices are exploring the use of Big Data sources for the production of official statistics. For the time being there are only a few examples where these sources have been fully integrated into the actual statistics production. Consequently, the full extent of implications caused by their integration is not yet known. Meanwhile, first attempts have been made to analyse the conditions and impact of Big Data on different aspects of statistical production such as quality or methodology. A recent task team elaborated a quality framework for the production of statistics from Big Data in the context of the Big Data project of the United Nations Economic Commission for Europe (UNECE). According to the European Statistics Code of Practice the provision of high quality statistical information is the main objective of statistical offices. Since risk is defined as the effect of uncertainty on objectives (e.g. by the International Organization for Standardization's ISO 31000) we have found it appropriate to categorise risks according to the quality dimensions they affect. The suggested quality framework for statistics derived from Big Data sources provides a structured view of quality related to all phases of the statistical business process and thus may serve as basis for a comprehensive assessment and management of risks related to these new data sources. It introduces new quality dimensions that are specific to or (of high importance when) using Big Data for official statistics, such as institutional/business environment or complexity. Using these new quality dimensions it is possible to derive risks related to the use of Big Data sources in official statistics in a more systematic way.

In the present paper we aim to identify risks induced by the use of Big Data in the context of official statistics. We follow a systematic approach of defining risks in the context of the suggested quality framework. Concentrating on the newly proposed quality dimensions we are able to describe risks that are currently not present or do not have an impact on the production of official statistics. At the same time we are able to identify current risks that will be evaluated very differently when using Big Data for producing statistics. Then we go further into the risk management cycle and provide an assessment of likelihood and impact of these risks. As the assessment of risks involves subjectivity in attributing likelihood and impact to the different risks we measure the agreement between the scores of different stakeholders given independently. Then, we propose options for reducing these risks according to the four major categories avoidance, reduction, sharing and retention. According to ISO, one of the principles of risk management should be to create value, i.e. the resources for mitigating risks should be lower than for doing nothing. Following this principle, we finally make an assessment of the possible impact of some actions on risk mitigation on the quality of the final outputs to come to a more comprehensive assessment of Big Data usage for official statistics.