Blog

European Commission Digital

How CEF Big Data Test Infrastructure is helping to modernise official statistics

European statistical offices use BDTI to experiment with big data and to find new ways of collecting statistics for more accurate results.



Big Data Hackathon 2019: Director-General of Eurostat, Mariana Kotzeva, in the middle with the top three winning teams. Photo courtesy of Eurostat.


Quick facts

  • Project: European Big Data Hackathon 2019
  • Organisation: Eurostat (European Commission) with National Statistical Institutes (NSIs)
  • Challenge: How to modernise statistics with automated data collection and more accurate indicators to better support policy decisions?
  • Solution: Experiment with big data from mobile devices to create smart surveys for more accurate statistical indicators
  • Building block: Big Data Test Infrastructure (BDTI)
  • CEF funding: Yes


Redefining statistics with big data

In an effort to continuously improve the statistical services offered, Eurostat organised the ‘Big Data Hackathon 2019’ in association with a number of European National Statistical Institutes (NSIs). The purpose was to foster innovation through collaboration with third party developers. Participants from all across Europe had two days to experiment with data and innovate new indicators on time use. The hackathon was based on a large amount of data, which  in a first for Eurostat  was collected through mobile phones. Data was made available to each team using the European Commission’s Big Data Test Infrastructure (BDTI), a digital building block offered by the Connecting Europe Facility (CEF) programme. BDTI features virtual testing tools and big data expertise free of charge to public administrations in the EU.


The importance of statistics

Eurostat is the European Commission’s body responsible for providing high-quality statistical information that is reliable and objective. The importance of statistics goes beyond "nice-to-know" benchmarking countries and regions. They play an important role in adding transparency to the current state of the society and the performance of politicians. They also help policy makers and policy analysts reach informed decision.1

The European Statistical System (EES) is the partnership between the community statistical authority, which is the Commission (Eurostat), and NSIs and other national authorities responsible in each Member State for the development, production and dissemination of European statistics.

 

Behind the scenes

Prior to the Hackathon, data was collected for two weeks through mobile devices. Applications were installed on volunteers' phones to track their time use through two different means of data collection:

  • Automated data collection: information picked up by the phone’s existing sensors, such as the gyroscope, accelerometer and magnetometer to determine orientation and movement. The operating system’s states and events were also tracked to determine activities, such as when a person was on a call or listening to music.
  • Complementary manual data input: hourly pop-up questions about volunteers’ activities, where they were, with whom and what their mood was.

Machine learning algorithms ease manual input by using sensor data to determine what activity is ongoing and to detect when an activity ends. Altogether, 25-30 different types of information were collected, with 20 pieces of information per second

Special infrastructure and support were needed to handle the vast amount of data collected. The pieces of data collected by the applications were sent to a server deployed in a big data cluster using BDTI.

The Hackathon itself ran from 8 to 12 March 2019. 17 teams competed with three people each to solve the given statistical challenge: innovate new indicators on time use. Time use is a common indicator that is well established in both research and official statistics. It is an important indicator for several policy areas and can answer a wide range of questions, such as how much time people from different demographics spend on education and how many hours people work per week.


BDTI for fast and reliable support

The main difficulty for Eurostat resided in finding a partner that could work with the hackathon’s tight timeline. The BDTI team was able to meet all deadlines, with the data collection successfully set up within a month. Same with the technical infrastructure for the hackathon. Furthermore, working with BDTI did not require budgetary planning or time-consuming framework contracts, since all services and supports offered by BDTI are free of charge for Europe's public administrations.

The technical burden on the infrastructure during the hackathon also brought challenges, as 60 people worked in the same data centre to process large amounts of data simultaneously. BDTI provided on-site support staff and there were no performance issues with the platform. 

On the last day of the Hackathon, each team presented the prototypes they developed. The winning team came from Poland, followed by Italy and the UK. The winning data prototype compared caloric intake (estimated based on pictures of meals) with how many calories were consumed (based on activity estimated by the phone’s motion sensors). In October 2019, the winners will have a chance to present further developed versions of their prototypes to Eurostat experts.

Once the prototypes are fully developed, they can be adopted by official statistics bodies, modernising and improving the way data collection and reporting are done. Statistics based on big data are more accurate than traditional means of surveying, such as keeping a diary or maintaining manual logs. With BDTI, it is much easier to explore data and find new indicators for the creation of smart surveys.


“With the BDTI, we could very quickly set up a substantial infrastructure for the European Big Data Hackathon 2019 and get great support!”

Fernando Reis, Big data statistician, Eurostat


What's next?

In the future, Eurostat foresees developing frameworks for collecting data using a wide range of applications to facilitate the implementation of smart surveys. For example, another smart survey could be developed for household consumption, where spending is machine analysed through pictures of grocery receipts.

Given the positive experience of working with the BDTI team, Eurostat are currently using BDTI again for innovating and testing big data based smart surveys and frameworks.



How can CEF help you?

If you’re interested in using BDTI for a project of your own, we would be happy to help you. The support services provided by CEF are described on our website and available to all. Visit us at BDTI to learn more.



Sources

  1. Eurostat Overview, European Commission, accessed 28 June 2019, <https://ec.europa.eu/eurostat/about/overview>