Tackling big data challenges in shades of grey
EU-funded researchers apply data mining concepts that overcome the challenge of working with limited and incomplete information.
© sdecoret - fotolia.com
Combining the expertise of Chinese and European researchers, the EU-funded GS-A-DM-DS project has led to significant advances in the reliability and applicability of data mining algorithms based on grey systems theory, which enables valuable insights and predictions to be gained from incomplete data.
Grey systems models developed by the GS-A-DM-DS researchers were successfully used to help select a research and development team to work on Chinas first domestic-built commercial passenger jet, the C919 that took its maiden flight in May 2017. The grey system approach was chosen because China had no prior experience in building such an aircraft and therefore had limited reference data.
A Chinese renewable energy company has also employed models advanced in the project to be able to predict when a gearbox on a wind turbine will fail, enabling it to prolong the life of its turbines by 36 %, resulting in fewer shutdowns and saving tens of millions of euros.
The C919 aircraft and the wind turbines are just two examples of how our models have been used in the real world to great effect, says project manager Yingjie Yang of De Montfort University in the UK. As our work focuses on the theory rather than the practical applications, we hope to show that grey systems models are a viable option for companies, governments and policymakers in areas with limited or poor quality data, such as socio-economic analysis, healthcare, climate change and complex R&D projects.
In GS-A-DM-DS, Yang worked with Sifeng Liu, one of the worlds leading experts on grey systems theory, to propose novel models for prediction and decision-making to provide more reliable results with lower computational requirements.
They also developed algorithms to help users choose the right grey systems models for their applications, and sought to expand knowledge and access to the emerging field, which has been little explored in Europe until now.
In contrast, grey systems have delivered considerable success in China, where the theory was introduced in 1982 by Lius mentor Julong Deng. First used for macroeconomic, food security and healthcare forecasting in order to overcome a shortage of public and historical records, grey systems theory has since found applications in a wide range of industries.
Today, the models are being used by oil companies to successfully predict where to drill new wells, saving considerable sums of money on speculative excavations; transport authorities are using the methods to estimate traffic on highways before they are built; and multinational brands are exploring the techniques to gain insights into how customers feel about their products.
Solutions for real-world problems
Grey systems theory defines situations with no information as black and those with perfect information as white, while accepting that neither of those situations occurs in the real world: reality is instead a grey area somewhere in between in which some information is known and some unknown. Hence, grey systems analysis will not yield one optimal solution, but rather a range of good solutions appropriate for real world problems based on accessible data and accounting for uncertainty.
Current data mining research is focused almost exclusively on big data using full data sets, but with incomplete data these models fail: probability will not work, statistics will not work, while grey systems models can give you a good result with as little as four data points, says Yang. In this sense, big data is not always better.
The GS-A-DM-DS researchers, who received support from the EUs Marie Skłodowska-Curie actions (MSCA) programme, are sharing the results of their studies with academics and industries around the world, and have published more than 30 research papers in leading international academic journals.
They have also established an international association on grey systems and uncertainty analysis to support ongoing research in the field, while Yang has been selected as one of the 10 short-listed promising scientists in the Communicating Science category of the MSCA 2017 Prizes.
Our work has the potential to play a significant role in the development of grey systems and data mining in China and Europe, potentially reinforcing the research excellence of the EU in data analysis and decision-support technologies, Yang says.