Innovation and Networks Executive Agency


INEA ceased operations on 31 March 2021. The European Health and Digital Executive Agency (HaDEA) was established on 1 April 2021 to take over the CEF Telecom legacy portfolio as well as additional EU funding programmes.
MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
June 2021 to May 2023
Maximum EU contribution: 
Total eligible costs: 
Percentage of EU support: 

University of Alicante (Spain)

Additional information: 

Digital Single Market (DSM) strategy

DSM - Connecting Europe Facility

CEF Digital portal

Innovation and Networks Executive Agency (INEA)

Automated Translation

Last modified: 
November 2021


This Action aims to improve machine translation output quality by extending and enhancing the quality of the data sets, especially for specific under-resourced languages. The Action builds upon previous CEF-funded Actions ParaCrawl and EuroPat, H2020 project ‘GoURMET’ and the FP7 MSCA project ‘Abu-MaTran’.

Within the Action, new monolingual and parallel data will be acquired and enriched for the following under-resourced languages: Maltese, Slovenian, Croatian, Bulgarian, Turkish, Serbian, Montenegrin, Macedonian, Albanian and Icelandic. Text classification will be used to identify the appropriateness of parallel and monolingual data for the ten DSI categories for which the ELRC repository contains data: e-Health, e-Justice, Online Dispute Resolution, Europeana, Open Data Portal, Business Registers Interconnection System, e-Procurement, Safer Internet, Cybersecurity, and EESSI.

As a result, the Action will extend the data in ELRC-Share and focus on DSI-specific data to align with the automated production and configuration of text translation engines tailored to the needs of online public services in specific domains. Finally, by enriching the data, the Action will contribute to the collection of language resources through ELRC-SHARE to improve the quality of the machine translation services offered by CEF AT.