Blog

European Commission Digital

CEF Big Data Building Block new release and self-paced labs announcement

three people discussing data with stats projection


The European Commission is happy to announce the launch of CEF Big Data Test Infrastructure (BDTI) self-paced labs.

The BDTI Building Block is a ready-to-use virtual environment for public administrations to perform analysis on large datasets. This helps administrations get the most out of their data, with key insights that can drive policy-making, such as how to safely lift coronavirus restrictions by modelling the flow of citizens through a city

However, not all those who want to leverage the powerful tools made available by BDTI are data scientists. The BDTI team realised that public administrations lacking technical expertise could benefit greatly from additional guidance, both with basic data science frameworks and taking their analytical environment into an in-house production environment after a successful pilot.

This is why the Commission has created brand-new, BDTI self-paced labs, as well as a new advisory service built into the latest release of the BDTI Building Block. 

BDTI self-paced labs 

The self-paced labs are a set of user-friendly tutorials for onboarding non-technical people looking to run a BDTI pilot. Thanks to these labs, any public administration can become familiar with data science methodologies and tools, regardless of whether they have a dedicated department for such projects. 

The labs provide three main services: 

  • An interactive learning experience: the Jupyter notebook will help you get familiar with the basic coding language and methodologies to start performing analysis on large datasets; 
  • Coding frameworks: a set of popular python frameworks for working with datasets; 
  • Open source data: the labs make large datasets available for users to experiment with, sourced from the EU Open Data Portal and Kaggle, a machine-learning community.

There are two main types of self-paced lab available for pilot-users, depending on the size of the datasets you want to work with. 

For medium-sized datasets, Machine Learning labs give an introduction on how to use popular python packages for data exploration, tackling regression (predicting numerical values  of your dataset), classification (predicting predefined classes of your dataset) and clustering (grouping similar members together) machine learning problems.

Big Analytics Data Exploration labs introduce Big Data frameworks such as Apache Spark and Apache Hive to handle data exploration on huge datasets, for example, datasets over 30GBs. 

Advisory service

The beginning of 2021 also brings the latest release of the overall BDTI package, with improved infrastructure and a new advisory service.

After  pilot-users have got familiar with big data techniques , the prospect of "off-boarding" from the BDTI technical team's support can be daunting. 

This new advisory service will provide guidance on how users can take insights gained from their BDTI pilot forward, implementing a similar virtual infrastructure for data  science and big data techniques in an in-house production environment.

Want to get the most out of your data?