EU Science Hub

ECDC-Translation Memory

Introduction

In October 2012, the European Union (EU) agency 'European Centre for Disease Prevention and Control' (ECDC) released a translation memory (TM), i.e. a collection of sentences and their professionally produced translations, in twenty-five languages. The data gets distributed via the web pages of the EC's Joint Research Centre (JRC). Here we describe this resource, which bears the name ECDC Translation Memory, short ECDC-TM.

Languages / File Format

ECDC-TM covers 25 languages: the 23 official languages of the EU plus Norwegian (Norsk) and Icelandic. ECDC-TM was created by translating from English into the following 24 languages: Bulgarian, Czech, Danish, Dutch, English, Estonian, Gaelige (Irish), German, Greek, Finnish, French, Hungarian, Icelandic, Italian, Latvian, Lithuanian, Maltese, Norwegian (NOrsk), Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish. The JRC then combined these 24 translation memory files to produce one large translation memory, allowing to also extract translation units for other language pairs.
All documents and sentences were thus originally written in English. They were then translated into the other languages by professional translators from the Translation Centre CdT in Luxembourg.

Text types / Domain

ECDC-TM was built on the basis of the website of the European Centre for Disease Prevention and Control (ECDC). The major part of the documents talks about health-related topics (anthrax, botulism, cholera, dengue fever, hepatitis, etc.), but some of the web pages also describe the organisation ECDC (e.g. its organisation, job opportunities) and its activities (e.g. epidemic intelligence, surveillance). The file ECDC-domains.xlsx gives further details.

 

Statistics for the ECDC Translation Memory

The following table shows the size of ECDC Translation Memory per language: the number of translation units, the number of words and characters of the whole corpus and the average number of words and characters per translation unit.

 

Terms of Use

By downloading or using the ECDC-Translation Memory, you are bound by the ECDC-TM usage conditions (PDF).

 

Further Translation Memories (and more) available on our site

The public release of the ECDC-Translation Memory follows the release of various other multilingual resources via the JRC's website. These include the JRC-Acquis parallel corpus since 2006 (22 languages); the DGT-Translation Memory (DGT-TM) since 2007 (22 languages); the JRC-Names multilingual and multi-script name variant list and related software (since 2011); and the JRC Eurovoc Indexer (JEX) multilingual document categorisation software (22 languages) since 2012. For details and other, smaller linguistic resources, see the JRC-Resources page.
Further multilingual linguistic resources will be made available in the future.

 

Download the ECDC Translation Memory

The distribution of the ECDC Translation Memory consists of a single zip file (ECDC-TM.zip), which can be downloaded by clicking on the link below.

Referring to this resource

When referring to the ECDC-TM in publications, please use the following reference:

Acknowledgement and Contact

For more information on ECDC-TM, you can contact the following persons:
 

Web Editor for Multilingual Content
Email address: webmaster@ecdc.europa.eu
European Centre for Disease Prevention and Control (ECDC)
Tomtebodavägen 11A
171 83 Stockholm, Sweden
URL: http://www.ecdc.europa.eu

 

Joint Research Centre (JRC)
Ralf Steinberger (Email address format: Firstname.Lastname@jrc.ec.europa.eu)
IPSC - GlobeSec - OPTIMA
Via E. Fermi 2749, T.P. 267
I-21027 Ispra (VA)