Innovation and Networks Executive Agency

2019-EU-IA-0031

INEA ceased operations on 31 March 2021. The European Health and Digital Executive Agency (HaDEA) was established on 1 April 2021 to take over the CEF Telecom legacy portfolio as well as additional EU funding programmes.
Unsupervised MT for Low-resourced language pairs (MT4All)
Programme: 
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
January 2020 to December 2021
Maximum EU contribution: 
€469,098
Total eligible costs: 
€625,464
Percentage of EU support: 
75%
Coordinator: 

UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO UNIBERTSITATEA (Spain)

Status:
DSI:
Additional information: 

Digital Single Market (DSM) strategy
http://ec.europa.eu/priorities/digital-single-market

DSM - Connecting Europe Facility
http://ec.europa.eu/digital-single-market/connecting-europe-facility

CEF Digital portal
https://ec.europa.eu/cefdigital

Innovation and Networks Executive Agency (INEA)
http://inea.ec.europa.eu

Automated Translation
https://ec.europa.eu/digital-single-market/en/automated-translation

Last modified: 
October 2021

2019-EU-IA-0031

This Action as a language resource project will facilitate the provision of bilingual corpora for the under-resourced languages in fields of public interest at the EU level, such as e-Health and e-Justice.

MT4All will contribute to the CEF Automated Translation Building block by enlarging its coverage for language pairs and domains, for which parallel data do not exist. At the same time, the Action will leverage previous CEF-funded Actions (2016-EU-IA-0111 and 2016-EU-IA-0114) by generating new bilingual resources from monolingual data previously collected by these initiatives. The new bilingual data, exploited by the MT4All technology, will be used to enhance existing machine-translation engines of the CEF Automated Translation Building block, and to build new engines for non-covered language pairs or domains.

Overall, the Action will generate bilingual resources for those language pairs lacking sufficient parallel corpora, by leveraging recent research carried out in the field of unsupervised learning. In particular, the Action will derive bilingual dictionaries, language models and translation models from large amounts of monolingual corpora only.