Innovation and Networks Executive Agency


INEA ceased operations on 31 March 2021. The European Health and Digital Executive Agency (HaDEA) was established on 1 April 2021 to take over the CEF Telecom legacy portfolio as well as additional EU funding programmes.
Continued Web-Scale Provision of Parallel Corpora for European Languages
CEF Telecom
Call year:
Location of the Action:
Implementation schedule: 
October 2019 to September 2021
Maximum EU contribution: 
Total eligible costs: 
Percentage of EU support: 

University of Edinburgh (United Kingdom)

Additional information: 

Digital Single Market (DSM) strategy

DSM - Connecting Europe Facility

CEF Digital portal

Innovation and Networks Executive Agency (INEA)

Automated Translation

Last modified: 
April 2022


This language resource Action aims to improve and expand the parallel corpora developed in two CEF funded Actions (ParaCrawl-1-Action no 2016-EU-IA-0114 and ParaCrawl-2-Action no 2017-EU-IA-0178). These previous Actions have already resulted in the release of the largest ever publicly available parallel corpora, for all EU/EEA official languages paired with English, as well as a complete end-to-end crawling and extraction open-source software toolkit.

This Action will offer improved extraction software capable of efficiently processing an even larger portion of the Web (more than 1 compressed petabyte). At the same time, it will apply state-of-the-art neural methods to the detection of parallel sentences, and the processing of the extracted corpora. Special emphasis will be placed on collecting larger corpora for language pairs that are currently under-resourced.

The corpora will be made more useful for training machine translation (MT) systems by post-processing the data to split long sentences, repair broken sentences and synthesise new sentences. The new corpus releases will be made available via a data portal which will allow the users building the machine translation systems to select the types of text which best fit their purpose.