Innovation and Networks Executive Agency

2018-EU-IA-0061

EuroPat: Unleashing European Patent Translations
Programme: 
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
September 2019 to September 2021
Maximum EU contribution: 
€695,890
Total eligible costs: 
€927,854
Percentage of EU support: 
75%
Coordinator: 

University of Edinburgh (United Kingdom)
https://www.ed.ac.uk/

Status:
DSI:
Additional information: 
Last modified: 
January 2021

2018-EU-IA-0061

EuroPat: Unleashing European Patent Translations, will mine parallel corpora from patents by aggregating, aligning, and converting patent data. The targeted language pairs are English in parallel with the following languages: Croatian, Norwegian (Bokmål), German, Polish, Spanish, and French. Icelandic may be added contingent on agreement and size of the data from the Icelandic Patent Office.

The aim of the Action is to prepare clean processed parallel corpora in the patent domain. The choice of domain is justified through high quality translations, large number of data and permissive copyright translation. Moreover, patents are a rich source of technical vocabulary, product names, and person names that complement other data sources. In addition to ingesting European Patent Office (EPO) data in many languages, the Action also targets Croatian and Norwegian national patent offices.

The Action will contribute to CEF eTranslation through the provision of good quality data. As neural machine translation (NMT) engines are more sensitive to the quality of the data, they perform better if they are trained with clean and good quality data.