Innovation and Networks Executive Agency

2020-EU-IA-0078

INEA ceased operations on 31 March 2021. The European Health and Digital Executive Agency (HaDEA) was established on 1 April 2021 to take over the CEF Telecom legacy portfolio as well as additional EU funding programmes.
MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages
Programme: 
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
June 2021 to May 2023
Maximum EU contribution: 
€683,693
Total eligible costs: 
€911,590
Percentage of EU support: 
75%
Coordinator: 

University of Alicante (Spain)

Status:
DSI:
Additional information: 

Digital Single Market (DSM) strategy
http://ec.europa.eu/priorities/digital-single-market

DSM - Connecting Europe Facility
http://ec.europa.eu/digital-single-market/connecting-europe-facility

CEF Digital portal
https://ec.europa.eu/cefdigital

Innovation and Networks Executive Agency (INEA)
http://inea.ec.europa.eu

Automated Translation
https://ec.europa.eu/digital-single-market/en/automated-translation

Last modified: 
November 2021

2020-EU-IA-0078

This Action aims to improve machine translation output quality by extending and enhancing the quality of the data sets, especially for specific under-resourced languages. The Action builds upon previous CEF-funded Actions ParaCrawl and EuroPat, H2020 project ‘GoURMET’ and the FP7 MSCA project ‘Abu-MaTran’.

Within the Action, new monolingual and parallel data will be acquired and enriched for the following under-resourced languages: Maltese, Slovenian, Croatian, Bulgarian, Turkish, Serbian, Montenegrin, Macedonian, Albanian and Icelandic. Text classification will be used to identify the appropriateness of parallel and monolingual data for the ten DSI categories for which the ELRC repository contains data: e-Health, e-Justice, Online Dispute Resolution, Europeana, Open Data Portal, Business Registers Interconnection System, e-Procurement, Safer Internet, Cybersecurity, and EESSI.

As a result, the Action will extend the data in ELRC-Share and focus on DSI-specific data to align with the automated production and configuration of text translation engines tailored to the needs of online public services in specific domains. Finally, by enriching the data, the Action will contribute to the collection of language resources through ELRC-SHARE to improve the quality of the machine translation services offered by CEF AT.