Workshop: How AI can help in becoming a Multilingual Enabler
Date: 17.10.2018, 09:00 - 17:00
Place: EC Building BU 25, Auderghem
Luc Meertens, Welcome (presentation)
Philippe Gelin, Objectives of DG Connect
The overall objective is to contribute to the development of CEF Automated Translation as a “multilingualism enabler” for CEF DSIs, online services linked to CEF DSIs and other relevant public online services. The specific objectives are (i) to gather information on additional needs of the CEF DSIs and public services; (ii) to analyze the range of services which could extend CEF Automated Translation and (iii) to support CEF DSIs and related systems with a view to maximizing their use of CEF Automated Translation services.
Cristina Espana i Bonet, The role of AI within Natural Language Understanding (presentation)
Natural language processing (NLP) is one of the most challenging technologies in artificial intelligence (AI). Recent advances in deep learning and the increasing necessity of fast analyses of human languages make NLP a very active research field that it is starting to achieve "human performance" in some tasks. This talk introduces the concepts of AI and NLP and shows inherent characteristics of natural languages that make them complex for machines. I will present the basic approaches to natural language processing and understanding and, afterwards, explain the notions of word embeddings, neural networks and deep learning. This background is needed to grasp the success of current systems and what can be achieved in the close future.
Sara Szoc, Anonymization explained (presentation)
As the field of language technology relies on large amounts of data to apply advanced machine learning algorithms, understanding the benefits, challenges and risks that come with using these data is essential. In this light, we discuss the concept of data anonymization. It is the process of manipulating datasets in such a way that no sensitive information pertaining to individuals or organizations can be learned. Its main goal is to protect privacy and confidentiality, and to ensure compliance with all relevant regulations. In this talk, we give an overview of tools and techniques that can help to anonymize data.
Tom Vanallemeersch, Classification explained (presentation)
The concept of classification consists of the automatic assignment of a specific class or category to objects. It can be applied in a great variety of scenarios, e.g. classifying documents according to their topic, assigning grammatical categories to words, tagging paragraphs with a sentiment label, etc. A related technique is the automatic grouping ("clustering") of text according to similarity without using a predefined set of categories. In this talk we address some of the limitations of classification/clustering, we briefly discuss the link with anonymization, and suggest potential ways for DSIs to apply or develop tools.
Diana Maynard, Question-Answering explained (presentation)
Question-Answering (QA) is typically seen as a specialized kind of document search, where instead of returning a ranked list of documents in response to one or more keywords, it returns a precise answer (word or phrase) in response to a question expressed in natural language. This requires a number of NLP and information retrieval components in order to understand and categorize the question, find the possible set of answers in a collection, and rank them. This talk gives an overview of the task of QA and its importance in the Language Technology landscape. After outlining briefly the key NLP technologies, we give some examples of real applications. Finally, we consider the outlook for the future and the most important directions for further research.
Guillaume Jacquet, Cross-lingual Named Entity Recognition (presentation)
The Text and Data Mining Unit, JRC-I3 Unit, develops innovative solutions for retrieving and extracting information from the internet, and especially from online news and social media, serving many Commission Services, EU agencies and some EU Member State authorities. Named Entity Recognition (NER) is part of this information extraction from text. We developed an in-house multilingual NER system, which combines rule based and machine learning approaches. More recently, some improvements have been done on multi-word entity recognition and fine-grained annotation. The current system and its ongoing extensions will be presented
Tomasz Debski/Yoana Nikolova, NLU at eJustice (presentation)
We explain ways in which the access to case law decisions may be automated, such as approaches involving question-answering and anonymization of the decisions before online publishing.
Monika Taxer/Margarita Tuch, NLU at ODR (presentation)
We explain the way in which consumers' complaints are currently categorized in order to suggest a dispute resolution body to the consumer, and how this classification may be improved. We discuss the importance of anonymization for providing access to complaints.
Viola Pinzi, NLU at Safer Internet (presentation)
We discuss the topic of harmful content in social media, such as hate speech, and present our wish list regarding the automatic detection of such content (e.g. classification of web pages).
Tom Vanallemeersch, NLU at other DSIs (presentation)
During the last few months, focused meetings were held with all DSIs. In this last session of the day, we highlight our findings of this first round of sessions and demonstrate to what extent the NLP components discussed today play a role within BRIS, Cybersecurity, eProcurement, Open Data Portal, eHealth and Europeana.
- No labels