Innovation and Networks Executive Agency

2017-EU-IA-0136

Multilingual Resources for CEF.AT in the legal domain
Programme: 
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
October 2018 to September 2020
Maximum EU contribution: 
1 412 786 €
Total eligible costs: 
1 883 715 €
Percentage of EU support: 
75%
Coordinator: 

Research Institute for Linguistics of the Hungarian Academy of Sciences (Hungary)
http://www.nytud.hu/eng/

Status:
DSI:
Additional information: 
Last modified: 
September 2019

2017-EU-IA-0136

The overall objective of this Action is to provide automatic translation on the body of national legislation in seven countries: Bulgaria, Croatia, Hungary, Poland, Romania, Slovakia and Slovenia. At present national legislation texts are not automatically available to CEF.AT and present Machine Translation (MT) systems could be improved if they had access to national legislative texts.

The Action aims to process two resources available in all languages concerned i.e. the multilingual ontology-based thesaurus EUROVOC on the one hand and the corpora of all national legislation in the respective languages on the other. The following deliverables will be produced:

1) Seven large-scale suitably pre-processed monolingual corpora of national legislation documents classified into EUROVOC topics/descriptors and enriched with EUROVOC and IATE terms identified.
2) Comparable corpus of seven languages aligned at the topic level domains identified by EUROVOC descriptors.
3) Croatian English parallel corpus consisting of 1800 legislative documents.

The Action will also have an impact both on the e-justice and the Online Dispute Resolution Digital Service Infrastructures as the resources focus on national legislation, which is of direct relevance to both DSI’s.