Navigation path

Big Data

Policies and societal action in the age of networked, citizen-driven information: Big Data
In their book Big Data: A Revolution that Will Transform the Way We Live, Work and Think, Mayer-Schönberger and Cukier argue that '…benefits [of 'Big Data'] to society will be myriad as Big Data becomes part of the solution to pressing global challenges like addressing climate change, eradicating disease and fostering good governance and economic development.’

Decisions in the age of networked Big Data: Will they be different?
Pervasive ICT forms an unprecedented medium for social interaction and sensing, as well as providing prospects for new means to assimilate, analyze and interact with functioning societal systems. Crowdsourcing and technologies to support it play an integral role in bringing Big Data to use in policy and societal challenges: now individuals, small communities and organizations (i) can provide an important source of near real-time information, (ii) can be involved in the active computation process and (iii) can participate in the entire decision making process in a manner that was not possible earlier.
Can these changes really spell the end of certain kinds of exclusivity in the ways government authorities reach policy decisions? The development of protocols and information-sharing schemes for networked decision making includes methods for allowing individuals to convey their preferences, thoughts, and ideas to traditional decision makers. This indeed changes the dynamics of governance traditionally in the hands of centralized and hierarchical authorities.

Trends and weak signals contributing to this future: The Era of Big Data
An accelerating trend in society, policy, and science

A definition of "Big Data" identifies "5Vs" as common factors:

  • Volume (data in thousands of Exabytes). Data volume is the primary attribute of big data. 
  • Velocity (data is generated dynamically and processed in real time). Velocity describes the speed and/or frequency of the collection and processing of relevant data. Customer retention, quality of experience and fraud management are a few business areas that benefit from the fast exploitation of data. 
  • Variety (data is unstructured unlike relational databases). Variety refers to both the data sourced and its format. Newer types of data like social media, web logs, click streams come from the web. Big data also includes unstructured and semi-structured media types including voice, text, data.
  • Veracity (quality, authenticity, 'trustworthiness' of data.)
  • Value – A more useful way to think about Big Data is through a fifth V. McKinsey Global suggests that Big Data could account for 250 billion Euro potential annual value to Europe’s public sector administration—more than GDP of Greece.

DG CONNECT is supporting initiatives on Big Data to enhance our ability to extract knowledge from large heterogeneous data sets. A program is underway on 'knowledge and information management in data intensive and data dependent sectors' in the 'Data Value Chain' unit. The social and policy use of Big Data is addressed as well in the research and policy program on 'Global Systems Science'.  

Today, we are experiencing a major shift in decision making driven by several factors: unprecedented amounts of data from a variety of sources (smart grids, mobility data, sensor data, data from social media, census data…) on a variety of socio-economic, technological, and ecological systems; our vastly increasing ability to store and perform computation over very large data sets; the appearance of cloud services and other collaborative platforms; and in particular the mobile revolution allowing everyone anytime and anywhere to send and receive information.

McKinsey Global states that in 2010 more than 4 billion people (60 percent of the global population) were using mobile phones. 12 percent of those people had smartphones, whose penetration is growing at more than 20 percent a year. In addition, more than 30 million networked sensor nodes are now present in the transportation, automotive, industrial, utilities, and retail sectors. The number of these sensors is increasing at a rate of more than 30 percent a year.

The overall trend is that the world is becoming more and more interconnected by globally and continuously available data of all sorts. Even larger amounts of data will be generated in the future when the Internet of Things becomes (IoT) a reality. IoT refers to the use of sensors, actuators, and data communications technology built into physical objects of all sorts. Those objects could be easily tracked, coordinated, or controlled across a data network or the Internet. The number of connected devices in 2050 will be around 50 billion. 

Trends and impacts of decisions spread very fast on various global webs, including the World-Wide Web itself. The capacity to handle, process, and analyse massive amounts of data is becoming absolutely critical. In the U.S. alone there will be between 140,000–190,000 more analytical positions and 1.5 million more data-savvy managers needed to make sense from the growing volumes of data. Indeed, the Web creates a “mirror world” made by digital traces of humans and machines. As a consequence, fragments of potential knowledge are scattered in the digital data landscape ready to be used - and potentially misused - by all with the technical knowledge to so.

Pervasive ICT: A disruptive trend in society and policy?
We as humans are creating a socially-embedded cyber-physical information system – a network of devices embedded in the society and in the physical world. Smartphones and wearable devices are tearing down the digital-physical barrier, creating simultaneously the ability to digitally track the state, location, and preferences of a large number of individuals, and at the same time empowering their capability to be networked, to be informed and to inform in a timely fashion, both at the individual and collective level.

When linking this trend with new computing capacities (HPC, cloud, mobile, grid….), the 'pervasive ICT' vision is turning into reality: anytime, anywhere and any device computing and communication.

This trend has two important ramifications:
Humans are an integral part of the global information processing network; not only do they consume the services but they are in fact computing and 'sensing' themselves and providing the computed results and data to other humans.
These pervasive socially-embedded systems produce enormous amounts of data and information that is constantly processed, refined, analyzed, and used.  'Big Data' becomes a societal fact, and so becomes the need to process these data.

Twitter and Facebook have demonstrated the enormous power of complementary user-driven data development combined with scalable analysis algorithms.  In these 'social computation' applications, network propagation and analysis that provides real-time user-relevant feedback is critical.  The ‘Social Graph’ that many recognize as ‘revolutionary’ is a hybrid combining both the interaction-driven user data and the real-time analysis capabilities in a feedback loop.

Possible Policy Responses:
Towards data-centric and user-centric online supercomputing
The need to process huge amounts of data imposes new types of computing infrastructures: scalable and distributed, real-time and continuous, data-centric and model-driven, user-centric.

Data is abundant, "more than you can store or process in one place". This means that computation needs to be distributed – online – over a network of computer nodes: we must take the computation to the data rather than the reverse, as in cloud computing.  

IT tools for use in policy should be driven by a combination of data and appropriate theories/models. This new model of computation differs from traditional models in that producing, analysing, processing and curating data are integral parts of the computation. This means that the results of data analysis be routed back to the user that in turn will interact with it again. In this way, an ecology of services and applications based on innovative use and refinement of data is emerging.

The traditional clientele of high performance computing (HPC) was in engineering and physics. To fully take advantage of the emerging new user-centric paradigm we need effective ways for a variety of users to interact with models and data without becoming computing experts. This requires a joint effort from interaction design, use of commodity (low energy consumption) hardware cleverly networked in data centers, and traditional HPC.

The trend to Big Data implies a need for policy makers and regulators to face challenges from Big Data in their areas of policy. There is for instance a variety of regulatory issues (access rights to public and private data for example'. This is addressed in 'Data Value Chain unit' of CONNECT).  This is also why part of the ambition of the programme on 'Global Systems Science' in DG CONNECT is to work closely with policy directorates on the use of models and data in their policy domains.

Two use scenarios for Big Data in policy
(i) Policy to tackle pandemics is, today, strongly based on analysis of massive (non-medical and medical) data and encoding them into models. Uses include early warnings, mitigation of disease spread, scenario-based reasoning relating to health policies, and real-time decision support to first responders. Such services are more and more complemented by non-medical services, such as Google Flu Trends.

(ii) Financial Markets and their regulation:  Financial actors now routinely use analysis of considerable amounts of network and transactional data. This has led for instance in stock markets, to risk of herd behaviour (via automatic triggering of orders) and to discussions on regulating algorithmic trading.  Financial crisis are triggered by complex webs of transactions between bank institutions. These examples show the necessity that regulators understand the role and use of data in such systems as a step towards effective regulation.  A recent note from DG MARKT to DG CONNECT therefore points to the importance of research to understand better issues like financial contagion and dynamic network models to analyze systemic risk.

Evidence:

Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schönberger and Kenneth Cukier. London: John Murray, 2013.
Thinking Big with Big Data Analytics: http://www.intercomms.net/issue-20/nsd-4.html
McKinsey Global Institute, Big Data: The Next Frontier for Innovation: http://www.mckinsey.com/insights/business_technology/big_data_the_next_f...
McKinsey Global Institute, Disruptive Technologies: http://www.mckinsey.com/~/media/mckinsey/dotcom/insights%20and%20pubs/mg...
Big Data for Development: Challenges and Opportunities, White Paper, United Nations Global Pulse:
http://www.unglobalpulse.org/BigDataforDevWhitePaper  

 

Challenges: 
  • Important S&T challenge include research in algorithms, probability & statistics and optimization taking uncertainty (level of veracity and variety) into account to produce a viable support for decision making.
  • A core technical and scientific challenge is to develop the knowledge to process these pieces of information [big data], interlink them, and to integrate this information with models and theories to enhance our understanding. 
  • A core societal and policy challenge will be to use these data in a way that citizens themselves agree are to their collective benefit.
  • A core economic challenge will be ensuring that the benefits of Big Data- related innovation and productivity growths are distributed evenly in society. 
  • What are the prospects for societal and policy uses of Big Data? The rise of citizen-driven 'Big Data' opens a range of scientific, technological, social, ethical, and policy challenges that will need to be addressed in a coherent manner as the rise of 'Big Data' is an ongoing and unstoppable process. These challenges need to be addressed in particular with an all-encompassing view on how 'Big Data' relates to global challenges for society.

Opportunities: 
  • Research by McKinsey Global Institute suggests that data can create significant value for the world economy, enhancing the productivity and competitiveness of companies and the public sector. 
  • Making relevant data more readily accessible across otherwise separated departments in the public sector could sharply reduce search and processing time and therefore increase efficiency.
  • Human decision making could be supported by sophisticated algorithms which unearth valuable insights that would otherwise remain hidden.
  • McKinsey’s report ‘Big data: The next frontier for innovation, competition, and productivity’ argues that customers and citizens are both direct and indirect beneficiaries of Big Data-related innovation. The use of big data could enable improved health outcomes, higher-quality civic engagement with government, lower prices due to price transparency, and a better match between products and consumer needs. 
  • Segmenting population to tailor services to specific demographic groups for greater effectiveness,  efficiency and citizen satisfaction. 
  • A core environmental opportunity enabled by Big Data and the Internet of Things would be monitoring ocean systems, improving energy efficiency in the cities or even predicting and preventing potential earthquakes / other natural disasters. 
  • A core societal opportunity would be ensuring greater public safety – in the future an increasing number of cameras and sensors and the data they generate will allow police officers to monitor potential criminal constantly and prevent crimes. 
  • A key economic opportunity will be the creation of hundreds of jobs for data scientists and data analysts. 
  • Greater inclusion of citizens in policy making.
Questions: 
From a social use perspective, the crucial questions are:
What data should be open and what data proprietary?
What level of accuracy and completeness will different groups require to trust data?
Who has the knowledge to use data and for what purpose?
Is it always possible to reconcile privacy (e.g. protecting individual identity) and use for public benefit (e.g. use of health data to prevent/cure disease)?
Timeframe: 
2030
Desirability: 

Likelihood: 

Underpinning policy ideas

Supporting evidence