Translation Memories are parallel texts, i.e. texts and their manually produced translations. They are also referred to as bi-texts. A translation memory is a collection of small text segments and their translations (referred to as translation units, TU). These TUs can be sentences or parts of sentences. Translation memories are used to support translators by ensuring that pieces of text that have already been translated do not need to be translated again.
Both translation memories and parallel texts are important linguistic resources that can be used for a variety of purposes, including:
- training automatic systems for statistical machine translation (SMT);
- producing monolingual or multilingual lexical and semantic resources such as dictionaries and ontologies;
- training and testing multilingual information extraction software;
- checking translation consistency automatically;
- testing and benchmarking alignment software (for sentences, words, etc.).
The value of a parallel corpus grows with its size and with the number of languages for which translations exist. While parallel corpora for some languages are abundant, there are few or no parallel corpora for most language pairs. The most outstanding advantage of the various parallel corpora available via our web pages - apart from them being freely available - is the number of rare language pairs (e.g. Maltese-Estonian, Slovenian-Finnish, etc.).
The ECDC-TM is relatively small compared to the JRC-Acquis and to DGT-TM, but it has the advantage that it focuses on a very different domain, namely that of public health. Also, it includes translation units for the languages Irish (Gaelige, GA), Norwegian (Norsk, NO) and Icelandic (IS).