Further Details on the Presentations
Luc Meertens, Welcome
Alexandru Ceausu, Objectives of DG Connect
The overall objective is to contribute to the development of CEF Automated Translation as a “multilingualism enabler” for CEF DSIs, online services linked to CEF DSIs and other relevant public online services. The specific objectives are
(i) to gather information on additional needs of the CEF DSIs and public services;
(ii) to analyze the range of services which could extend CEF Automated Translation and
(iii) to support CEF DSIs and related systems with a view to maximizing their use of CEF Automated Translation services.
Andreas Eisele, Update of status eTranslation
FOD Kanselarij + Joachim van den Bogaert, Sharing public services data to obtain better MT quality
In its-day-to-day activities, the translation department of the Belgian Chancellery of the Prime Minister covers a wide range of topics. For some of these topics, specialized MT systems would be helpful (for example, documents related to legal or policy matters), while for other topics, a broad-domain engine would be more suitable (for example, press releases). In this talk, we explore the differences between broad domain and domain-specific translation from both the user and provider perspective. We discuss how public services can benefit from tailored MT solutions, and how they can contribute to the development of better MT systems at a European level.
Sara Szoc, MT Training Workflow
This presentation deals with the typical workflow for training an MT system, i.e incremental training. The MT system is regularly being retrained as more and more parallel data become available (such as a growing translation memory). Previous versions of the MT system act as a baseline which is improved upon, in terms of translation quality. The concept of baseline also applies in case of domain-specific MT: the latter should improve upon the general-purpose MT system.
Lieve Macken, Human and automatic evaluation of machine translation output
This talk compares human and automatic evaluation methods. Automatic evaluation metrics are typically used during the development of machine translation systems, for example to quickly compare successive versions of a single system with each other. Human evaluation of MT output is highly informative, but it is expensive in terms of time and expert human effort and may suffer from a lack of consistency.
Andrejs Vasiljevs, Terminology within MT
Khalid Choukri, Publicly available corpora
Tom Vanallemeersch, Tools for Data Gathering - DIY
While parallel corpora are available to a limited extent, much more parallel information can be found in document archives or online resources, for instance inside multilingual web sites. This presentation discusses tools and procedures that allow for automatically detecting equivalent documents or web pages and for linking equivalent sentences within these pairs of documents or pages. Such equivalent sentences allow, for instance, for the creation of a domain-specific parallel corpus.
Luc Meertens, Round Table Discussion “How to proceed from here ?”