Sašo Džeroski's MAESTRA project team developed software implementations that help machine learning systems to deal with big data complexity. The software is now public. How can the novel MAESTRA methods influence the world around us?

The molecular pathways targeted by dovitinib, one of the novel lead compounds for treating tuberculosis identified by using MAESTRA methods.

The potential of the methods developed within MAESTRA is enormous. The developed methods that can handle the various complexity aspects can now be applied in ways that were not possible before: Instead of simplifying the tasks at hand and making them compliant with existing machine learning and data mining methods, we can now thrive on the complexity and exploit it to obtain better predictive models. This opens many challenging research avenues that will lead to new great scientific achievements.

What are the challenges in machine learning and data mining we addressed in MAESTRA?

The goal of the MAESTRA project was to develop methods for processing big and complex data in order to use them for learning and predictions. In fact, data complexity makes machine learning quite difficult. Complexity might take the form of unstructured data or very large datasets where data may be streaming at high rates, incompletely or partially labelled data, or heterogeneous data coming from various sources. The hardest is to deal with several of these at the same time.

Dealing with each of these challenges individually has been the central topic of active research in areas such as structured-output prediction, mining data streams, semi-supervised learning, and mining network data. Our target and vision was more ambitious: Data mining methods that can simultaneously handle several complexity aspects, such as methods for predicting structured outputs on data streams. And we did it.

So, what methods have we developed within MAESTRA and on what real-life challenging problems did we showcase the developed methods?

The MAESTRA project developed a variety of novel predictive modelling methods capable of simultaneously addressing several complexity aspects. These methods now exist as software implementations able to address massive sets of data or streams of data, incompletely labelled data, and data coming from different sources and contexts. We also showed how to apply the developed methods in life sciences, sensor networks, multimedia, and social networks.

Now let's get technical and more specific. We introduced many tree-based (e.g., option trees) and rule-based methods for predicting structured outputs (e.g., for multi-target regression and multi-label classification) in both the batch and streaming setting. We also came up with effective methods for the semi-supervised versions of these tasks, which often perform much better than their supervised counter-parts (and almost never worse). Finally, we presented ontologies of data types and data mining methods covering a wide range of data mining tasks.

How are these innovations useful in real life?

The potential of the methods developed within MAESTRA is enormous. The developed methods that can handle the various complexity aspects can now be applied in ways that were not possible before: Instead of simplifying the tasks at hand and making them compliant with existing machine learning and data mining methods, we can now thrive on the complexity and exploit it to obtain better predictive models. This opens many challenging research avenues that will lead to new great scientific achievements.

But in terms of immediate applications, the methods developed were already used to successfully predict gene functions in thousands of bacterial genomes from data derived only from genome sequences, predict the phenotypes of micro-organisms from their genotypes, and identify novel lead compounds for treating tuberculosis and salmonella. They were also used to predict both the production and the consumption of electrical energy from different kinds of sensor data in different contexts (e.g., production of photovoltaic energy, thermal power consumption of the Mars Express orbiter) as well as in the context of intelligent transport (e.g., to predict equipment failure in trains and taxi demand). In the context of multimedia and social networks, they were used to improve the accuracy and efficiency of image annotation/retrieval as well as for sentiment analysis in different contexts.

The MAESTRA resources and main results are publicly available!

The major dissemination event of MAESTRA was the Summer School on Mining Big and Complex Data (http://maestra-project.eu/school/), with lectures on MAESTRA results by project participants and external experts from outside the consortium, covering different MAESTRA related topics. At the Summer School, we showcased the methods developed within the project as well as their applications to practically relevant problems. The lectures were recorded and are freely available.. Moreover, MAESTRA has produced a variety of resources, including publicly available software, ontologies and demos, available through the project website. We invite you to explore these resources.

Five MAESTRA papers received best paper awards at different conferences and a MAESTRA team from the Jožef Stefan Institute won the first ESA data mining challenge.

Published: 
19 July 2018
Last update: 
24 July 2018