Innovation and Networks Executive Agency

2019-EU-IA-0031

Unsupervised MT for Low-resourced language pairs (MT4All)
Programme: 
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
January 2020 to December 2021
Maximum EU contribution: 
€469,098
Total eligible costs: 
€625,464
Percentage of EU support: 
75%
Coordinator: 

UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO UNIBERTSITATEA (Spain)

Status:
DSI:
Additional information: 
Last modified: 
May 2020

2019-EU-IA-0031

This Action as a language resource project will facilitate the provision of bilingual corpora for the under-resourced languages in fields of public interest at the EU level, such as e-Health and e-Justice.

MT4All will contribute to the CEF Automated Translation Building block by enlarging its coverage for language pairs and domains, for which parallel data do not exist. At the same time, the Action will leverage previous CEF-funded Actions (2016-EU-IA-0111 and 2016-EU-IA-0114) by generating new bilingual resources from monolingual data previously collected by these initiatives. The new bilingual data, exploited by the MT4All technology, will be used to enhance existing machine-translation engines of the CEF Automated Translation Building block, and to build new engines for non-covered language pairs or domains.

Overall, the Action will generate bilingual resources for those language pairs lacking sufficient parallel corpora, by leveraging recent research carried out in the field of unsupervised learning. In particular, the Action will derive bilingual dictionaries, language models and translation models from large amounts of monolingual corpora only.