WPK Overview


The aim of this workpackage is to consolidate knowledge gained in this ESSnet (and with limitations outside, in academia and the non-ESS official statistics community) in the area of methodology and quality when using big data in the statistical production process and combine it with the insights from the previous ESSnet. 1. Several key deliverables from the previous ESSnet will be used, extended and enhanced, especially: the literature overview and 2. An overview of quality issues and their possible solutions and 3. An overview of methodologies applied and challenges identified Since the process and architecture are now dealt with in WPF and these issues cannot be seen completely independent from methodology and quality, there is a need for coordination between WPK and WPF. Experiences obtained in the pilots of WPB to WPE and WPG to WPJ and WPL are used as input to this WP. The WPL on preparing smart statistics is mentioned here explicitly in addition to the WPs of the implementation and pilot tracks, because new methodological and quality developments might happen in this new research area as well. To achieve a high degree of acceptance, feedback from relevant ESS bodies (e.g. the TF/SG BD, WG Qual, WG Meth, DIME) will be collected for key envisaged deliverables (especially the guidelines and the template).

Description of work

The following topics included in key deliverables of the previous ESSnet will be extended and enhanced.

1. Literature

The literature review from the previous ESSnet (SGA-2) is extended and now focuses on the area of quality indicators as outlined in the details below. As a next step, quality indicators should be researched, developed and/or described. 2. Quality

Based on the report on quality delivered by the previous ESSnet big data, the quality framework for big data will be further enhanced. There are three goals of this work. Firstly the information from the current pilots will be used to update the report. Secondly quality guidelines for the usage of big data in official statistics will be written based on the know-how from within the project an outside sources, e.g. the UNECE big data quality framework. The third goal is to develop a template for a quality report when using big data in the production of statistics. Just as the methodological part, the quality report will take into account the diverse nature of big data sources as well as the access to the data sources.

3. Methodology

The methodology report from the previous ESSnet is extended accordingly. The methodological steps in the general stepwise approach proposed in the previous methodological report for producing new statistics will be further elaborated upon and converted to concrete methodological guidelines. The methodological report will also take into account the diverse nature of big data sources as well as the access to the data sources. Here also a link is made between the effect of working in a data-driven way vs. working in the more traditional theory driven approach. This fundamental difference, i.e. induction vs. deduction, influences the way statistics are produced tremendiously.

Obviously, the choice of a specific method can directly influence quality or quality indicators, or might even bring its own quality indicators with it, this describes that there is a strong connection between methodology and quality. Therefore, quality and methodological frameworks interact and need to be set up in a coordinated way and it is a goal of the WP to have these frameworks compatible with each other.

Also in this WP the following topic will be included:

4. Typification

Big data projects can consist of more than one data source but even in the simplest case when one is confronted with the exploration of just one major data source it is hard to establish its overall level of maturity regarding its statistical exploitation. This single point is of great importance to help NSIs establish their strategy for statistical production and also the investment that should be devoted to the source or that is required by it to attain a mature level.

From the previous ESSnet three strands of big data projects were clearly identified that reflect the level of maturity of a big data project and which will be further investigated in the current ESSnet. These three stages are:

  1. Big Data Exploratory Projects – in which the WP starts investigating the data sources, its potential, quality and methodological problems and possible uses;
  2. Big Data Piloting Projects – in which pilot cases are developed to explore the uses devised and try to deal with the situations anticipated but in which many other problems will still be encountered;
  3. Big Data Implementation Projects - in which at least part of the statistical production can make use of the big data source and the process is implemented and the practical difficulties faced.

All of the stages require the same basic building blocks and will have to go through the same phases for ascertaining the data quality, developing methodologies to deal with the source, ensure the data storage and processing, and so on. However the level of constraints and rigor demanded and which is possible to achieve varies greatly. For example, several discovery strategies that are desirable on the exploratory case, may still be tolerated when piloting but are not possible at all during implementation.

Once the major drawbacks of a big data source/project are identified the potential for it to evolve and the areas where investment is required will be easier to track following the evolution roadmap. The roadmap will be of paramount importance to shorten the times to progress from less to more developed stages and to achieve more mature solution with less costs.

Expected outcomes

  1. An updated and extended literature overview.
  2. The main output of the quality part is quality guidelines and the template for quality reporting. If there are feasible results from the literature review the quality report template can as well include suggestions for actual quantitative quality indicators. The qualilty dimensions will be developed within a framework that will take into account previous developments related to the use of administrative data (ADMIN).
  3. An overview of the methodological findings when using big data in official statistics and a first version of methodological guidelines for the production of new statistical products are the main output for the methodological part.
  4. A typification matrix will be developed to assist the future evaluation of big data sources/projects. An evolution roadmap to navigate across the typification matrix will also be produced.

Task 1 - Literature overview

Performed by Statistics Austria, ISTAT (Statistics Italy), CBS (Statistics Netherlands), GUS (Statistics Poland) and INE-PT (Statistics Portugal)

WPK starts by updating the literature overview of the previous ESSnet (see ESSnet Big Data I Deliverable 8.1). Apart from the most important publications in the area of big data, the continously updated reports of the pilots of this ESSnet will be included in this study. A new focus of the literature overview is to identify useful quality indicators which could be used to quantify some quality dimension specific when using big data sources in the production of statistics. This task runs from M1 to M25.

Task 2 - Quality of big data

Performed by Statistics Austria, INE-ES (Statistics Spain), ISTAT (Statistics Italy), CBS (Statistics Netherlands), GUS (Statistics Poland) and INE-PT (Statistics Portugal)

When producing official statistics assessing quality is a crucial point of utmost importance. Beside the previous ESSnet, the quality framework developed by the UNECE and other external sources serve as input for developing quality guidelines. This work is divided into different subtasks: • An update of the report of the previous ESSnet which includes the important findings of the current pilots • Quality guidelines for the usage of big data in official statistics • A proposal for a template to report the quality for the case of big data usage.

Task 3 - Big data methodology

Performed by Statistics Austria, INE-ES (Statistics Spain), ISTAT (Statistics Italy), CBS (Statistics Netherlands), GUS (Statistics Poland) and INE-PT (Statistics Portugal)

The methodological report of the previous ESSnet will be updated with new findings from the literature review and the outcome of the pilots. Based upon the methodological report and the general stepwise approach proposed in the previous ESSnet methodological report, a more advanced framework will be developed and concrete methodological guidelines will be produced. Future challenges will als be identified.

Task 4 - Big data project typification

Performed by Statistics Austria, ISTAT (Statistics Italy), CBS (Statistics Netherlands), GUS (Statistics Poland) and INE-PT (Statistics Portugal)

Although, the insight about a possible typification of big data projects has emerged from the previous ESSnet it has not yet been explicitly expressed and it will be a task during the current ESSnet to take advantage of the three strands to explore and describe the elements that will classify a possible big data project in each of the three possibilities in a more comprehensive and exhaustive way.

Task 5 - Meetings for the Pilot track

Performed by Statistics Austria, ISTAT (Statistics Italy), CBS (Statistics Netherlands), GUS (Statistics Poland) and INE-PT (Statistics Portugal)

Two meetings for two full days will be organised bringing together members of the WPK with members of all pilot work packages (WPG-WPJ), a kick-off meeting in M2 and an intermediate meeting in M14. The objective of these meetings is to coordinate and organise the work of the whole pilots track together. The meetings will be organised at one of the WPK partners and the agenda/programme will be prepared. The meeting notes will be concanetate into minutes and provided by WPK.

Milestones and deliverables

See here for an overview of available milestones and deliverables.

WPK milestones

  KM1   Report on the kick-off meeting for the pilots track   Month 3
  KM2   Report on the WP meeting mid 2019   Month 9
  KM3   Report on the mid-term meeting for the pilots track   Month 14
  KM4   Report on the WP meeting mid-2020   Month 20

WPK deliverables

  K1   First draft of the quality guidelines   Month 9
  K2   Updated literature overview with outside sources and additional input from the ESSnet   Month 13
  K3   Revised version of the quality guidelines   Month 13
  K4   Quality report template draft   Month 13
  K5   First draft of methodological report   Month 17
  K6   Revised quality report template   Month 17
  K7   Typification Matrix for Big Data Projects   Month 18
  K8   Evolution Roadmap between the Areas of the Typification Matrix   Month 24
  K9   Revised version of the methodological report   Month 25
  K10   Report describing the methodological steps of using big data in official statistics with a section on the most important research questions for the future including guidelines   Month 25
  K11   Report describing the quality aspects of the different pilots   Month 25
  K12   Revised literature overview   Month 25