Emerging data analytics tools and techniques


Countless advanced tools for data analytics are being developed. Some of them have a huge potential, whereas others appear to not be applicable to official statistics (for instance because of their "black box" or proprietary nature) - and yet others look impressive, but are in reality just statistical methods relabelled as "data science".

At this daWos session, we will discuss how to cut through the hype to find and deploy the modern data analysis tools and techniques that are genuinely useful for official statistics.

Questions that could be addressed include

  • What are the state-of-the-art and/or advanced analytical tools and techniques already in use in the ESS? Automatic data retrieval, web-scraping, data cleaning, natural language processing, machine learning, pattern recognition…?
  • How to achieve transparency and maintain independence:
    • What is the perspective of running data analytics projects not merely following blindly the private sector (e.g., use of black-box models, adoption of dedicated infrastructure on which statistical institutes have little to no control)?
    • When adopting black-box tools (e.g., commercial-of-the-shelf-software or already trained models), how to communicate about it?
  • What about the sustainability and reliability of data analytics tools and techniques to support the creation of official statistical products?
  • How can knowledge acquired within the ESS be maintained efficiently and/or transferred effectively? How can technical solutions (tools and software) be shared (e.g., through SERV[1]-like project)? Is there a need for a common "playground" platform (e.g., like the UNECE sandbox[2]) to share data and enable users to analyse them?

[1] https://ec.europa.eu/eurostat/cros/content/ess-vision-2020-shared-services_en.