About the Sessions
Luc Meertens, Consortium (Welcome)
Philippe Gelin, DG Connect (Objectives of DG Connect)
The overall objective is to contribute to the development of CEF Automated Translation as a “multilingualism enabler” for CEF DSIs, online services linked to CEF DSIs and other relevant public online services.
The specific objectives are (i) to gather information on additional needs of the CEF DSIs and public services; (ii) to analyze the range of services which could extend CEF Automated Translation and (iii) to support CEF DSIs and related systems with a view to maximizing their use of CEF Automated Translation services.
Kris Demuynck, University Ghent (AI & Speech Technology: Past, Present and Future)
Given the recent series of positive reports by Google, Microsoft and others, one would start to believe that speech technology and the AI systems build on top of that technology are now universally applicable and on par with human performance. However, in reality humans clearly outperform even the best automatic systems on close to all tasks involving speech. Moreover, successful application of speech technology still requires a good understanding of the constraints of the technology, and how to circumvent them.
In this talk we will look at how computers handle speech, and how they learn to handle a task given just example data. We will also explain why speech, despite the fact that it seems to come easy to humans, is in fact a very hard task. We will give a brief overview of the various subdomains in speech processing, the related applications, and some of the hurdled and restrictions that still exist, thus providing a foundation for the remainder of the talks.
Olga Gordeeva, Acapela (State of the art Text-to-Speech in multimodal speech and language interfaces)
Text-to-Speech (TTS) solutions are used to turn written content into speech, and this for a large variety of languages, personas and styles. In the last decennia, TTS applications have evolved a lot and have been widely accepted and deployed in multimodal speech and language interfaces on the automotive market, accessibility devices, in education and transportation.
In our presentation, we highlight the latest State-of-the-Art in TTS and discuss some of the challenges recently addressed by the companies developing the technology. We provide an example of such challenges in "Empathic", a multimodal EU Horizon 2020 project, involving a close collaboration in the areas of speech, language and audiovisual processing to create an "Empathic Virtual Coach", which aims to facilitate independent healthy life years of the elderly.
Arjan Van Hessen, University Twente (Automatic Speech Recognition)
In general Automatic Speech Recognition (ASR) for most Western languages has seen a huge increase in quality lately. Also, more and more speech recognizers have become available as Open Source and are therefore used by both commercial and non-commercial organizations. A good example is the Dutch KALDI ASR, which is available through the Open Speech Foundation. But despite the excellent performance of these ASR-engines, we still see a number of serious issues. The good ASR results are mainly obtained with good and grammatically correct speech, recorded in a quiet environment with good microphones.
However, results drop significantly (even below 40% correct recognition) when we are confronted with other types of speech and recordings. This presentation will list current challenges and shortcomings that research is confronted with.
Chris Oates, AudEERING (Paralinguistics - It's not what you say it's how you say it!)
Paralinguistics can be described as the study of not what you say but how you say it. It is the metacommunication we use to modify the meaning of our speech. The paralinguistic information is transmitted through the use of tonal, speech rate, pauses, volume and intonation variations to name a few. Paralinguistics allows humans to verbally communicate in a far richer way than the simple text of the speech can allow.
Paralinguistic information can be used to understand a speaker’s state (emotion), trait (personality) and even the state of their health (voice disorders). Very often humans can understand and utilize the paralinguistic information contained in speech and act accordingly. At audEERING we are enabling machines to understand these paralinguistic signals in order to fully understand what the speaker really meanings and how they feel. This technology allows man-machine interactions to move far beyond the simple text comprehension we have today into a more friction-less and natural interaction.
Bart Minne, Myforce (Voice biometrics, a piece of the speech analytics puzzle)
Qualitative customer service is and will remain a strategic differentiator for many organizations. These days, speech analytics evaluates, supports or even automates the customer interaction and aims to deliver best in class experiences. Part of this process, is a correct verification of a person’s identity in order to allow access to personalized services.
Voice biometrics is the ideal authentication methodology for voice-based customer care. The technology, available in different variants, enables a user-friendly authentication, tailored for each specific use-case.
Pieter Buteneers, Chatlayer (Why most chatbots fail !)
Why is it that chatbots are so bad most of the time? Is the technology really not ready? And why is it then that some chatbots do work? In this talk I will dive deeper into the why and how. I will show the biggest issues related to the AI behind chatbots and how humans use this AI. And for each of these hurdles I will show you how to overcome them.
Jonas Kratochvil, Charles University/ELITR (Development of Czech ASR in context of ELITR project)
The presentation will introduce ELITR (European Live Translator; H2020 RIA, a collaborative initiative of Charles University, Karlsruhe Institute of Technology, University of Edinburgh, Pervoice and Alfaview), together with its main goals in the area of real-time automatic subtitling, document and subtitle translation and research towards automatic meeting summarization.
The later part of the talk will provide insights into the work behind the development of a speech recognition system for Czech language and its deployment in real-life applications. We will also highlight the common as well as less obvious challenges of this task. These range from data collection, target domain and speaker adaptation and integration with machine translation models to the design and implementation of the appropriate graphical interface for presenting the final output.
Maarten Verwaest, Limecraft (Practical Applications of Language Technology in the Domain of Audiovisual Media)
Limecraft is used in audiovisual media production to automate processing, editing and subtitling of audiovisual material. In this presentation, we give an overview of practical use cases of AI applications and share best practices in documentary production, archiving, subtitling and localisation. In particular, we will explain how to use state of the art Automatic Speech Recognition (ASR) in combination with Natural Language Processing (NLP) to automatically produce broadcast-compliant subtitles.
Luc Meertens, Consortium (Outro)