EU Science Hub

Competence Centre on Text Mining and Analysis

Rationale

Accurate, targeted, and timely information is needed by EU institutions at almost every stage of the decision making process. However, such data required by policy makers is increasingly embedded in large amounts of textual data available on the Internet, e.g. traditional or social media, or in large public or proprietary document sets. The sheer volume of this data makes it nearly impossible to extract the relevant information it contains manually. Text mining and analysis tools are necessary to address not only the problem of volume, but also of timeliness in order to provide the right information in the proper format for the decision making process, in a variety of contexts. 

The number of application domains relevant to EU institutions (including the European Commission's directorates-general or DGs) where text mining and analysis (TMA) plays an important role is extensive e.g.: political current affairs media monitoring (DG Communication); targeted information for crisis rooms to improve EU’s prevention, preparedness and response capabilities (DG European Civil Protection and Humanitarian Aid Operations (ECHO), European External Action Service (EEAS) ); information used for security purposes (DG Migration and Home Affairs, DG Human Resources and Security); business intelligence based on framework proposals (research DGs and executive agencies); research and innovation monitoring (research DGs and agencies); monitoring of health related issues (DG Health and Food Safety, European Centre of Disease Prevention and Control (ECDC), European Food Safety Authority (EFSA) ); monitoring of news in the financial sector (DG Financial Stability, Financial Services and Capital Markets Union, DG Economic and Financial Affairs).

Text mining techniques and tools are very much needed throughout the EU institutions but are highly specific and not directly accessible or useable by decision makers or policy domain experts supporting them. To use these tools and techniques so as to reliably provide decision makers with timely information requires a range of complementary skills: from analysis, through research and development of solutions based on computational linguistics, to deployment and operation of the systems, based on sound IT knowledge and practices. Each of these skills is necessary to accomplish the above. It is unlikely that small isolated groups could cover all these aspects or reach the required level of expertise.

 

Benefits

The benefits of establishing a Competence Centre on TMA are therefore:

  • Provide the expertise needed to provide practical solutions based on TMA: computational linguistic research, applied IT and support.
  • Maintain, expand and develop knowledge/experience in TMA in an operational environment
  • Provide sufficient critical mass to support research in TMA.
  • Provide sufficient capacity to answer to relevant ad-hoc requests.
  • Promote the harmonisation of tools/techniques allowing for better information exchange between users (e.g. DGs using MyNews, same platform for Council/COM/EEAS/EP)
  • Leveraging economy of scale by deploying the same technology/tools. This may be further augmented by providing Institutional support for small scale media monitoring activities for a large number of EU Offices and Agencies
  • Provide a clear point of reference for TMA and act as solution broker for TMA needs
  • Provide a one-stop-shop for tools, services and training for the EU institutions
  • Provide advice on the use of TMA techniques for information extraction
  • Support or conduct technical negotiations with external data providers of structured and unstructured textual data.
  • Reduce number of external interfaces to data providers
  • Together with Eurostat, organise the community of Data4policy within the Commission, and ensure interactions with Data4policy community outside of the Institutions.

Contact

For more information or to get in touch, please contact us via email on: JRC-TMA-CC@ec.europa.eu