Announcement | This website is no longer being updated

The CEF Digital programme 2014-2020 has concluded. The CEF Digital platform (https://ec.europa.eu/cefdigital/) is no longer being actively updated. A new website will be coming soon.

CEF DIGITAL home page


title of the story on the left side, and on the right side two hands coming to gether to form a circle around a map of Europe

The MARCELL CEF Telecom Action aims to bring down linguistic barriers within the Digital Single Market.

One of the project's primary goals is to make digital services platforms, such as Online Dispute ResolutioneJustice, and Europeana, accessible in multiple languages. The eTranslation Building Block has a key role in this endeavour. 

eTranslation is provided by the Connecting Europe Facility and faces the daunting task of delivering quality machine translation (MT) services across all the EU's Digital Services Infrastructure (DSI) in all the EU's official languages. 

One challenge for MT is the scarcity of high-quality language data and the potential for inaccurate information. Ideally, the language data used for training the MT system should cover specific domains relevant to citizens' lives, such as consumer rights and justice. National legislative texts are not automatically available to eTranslation, and current Machine Translation (MT) systems could improve if they had access to national legislative texts. 

The MARCELL Project

MARCELL's overall goal is to improve machine translation of national legislation (laws, decrees, regulations) in seven countries: Bulgaria, Croatia, Hungary, Poland, Romania, Slovakia, and Slovenia.The project provides large-scale legal monolingual data, which will then apply to other eTranslation systems. The project covers the total body of national legislative documents that are in force in these seven EU Member States.

Because the Member States' national legislation is not automatically available to the European Commission (EC), MARCELL relied mainly on EU legislation for its training sessions. The legal domain differs widely in terms of content. The seven monolingual data documents fall into 21 top-level domains. These include politics, economics, trade, education, communication, and science, under the official EU multilingual ontology-based thesaurus EUROVOC. The classification will thus yield 21 thematic sub-corpora in each language.

What is eTranslation?

eTranslation is an automated translation tool available to translate text snippets or full documents. It can also be integrated into a specific digital system if you need translation capabilities. 

The tool translates over 30 languages in different domains, including Russian and simplified Chinese. Users can also integrate eTranslation into their systems to make digital content and services multilingual, accessible to anyone in the EU.


 

Data Collection and Curation

The total number of sentences collected across the seven languages has reached 30 000, ranging from 1 000 to 10 000 per language. And these numbers will continue to grow. 

New national legislative texts appear every day in each of the seven countries. For that purpose, the consortium has built processing chains (pipelines) that periodically collect (using push or pull techniques) new legislative texts from the official national providers. 

It then converts those texts into a suitable format before gathering all relevant metadata. MARCELL processes the texts and delivers them to the existing ELRC-SHARE repository that feeds the eTranslation systems with training material. 

Expected Results

As a result, MARCELL will produce:

  1. Seven large-scale pre-processed texts of national legislation classified in EUROVOC top-level domains and supporting EUROVOC and IATE terms.
  2. Translated comparable legal texts in seven languages aligned with the top-level domains identified by EUROVOC descriptors.
  3. A Croatian-English parallel corpus comprising 1,800 legislative documents.#
  4. A set of seven pipelines for processing and feeding new legislative documents in the seven languages concerned

As the most recent EU official language, Croatian is six to nine years behind in the systematic accumulation of translation memories (TMs). 

Therefore, the Croatian-English Parallel Corpus of Croatian National Legislation was set up, with legal texts dating back from 1990 to 2019. So far, MARCELL has translated 1,800 documents into English.

Future steps

As MARCELL resources become available to train eTranslation engines, one can expect noticeable improvements in the output quality when translating legal texts into one of the seven languages.

Besides the expected general improvement of the MT system in the seven languages concerned, MARCELL will have significant benefits for both the eJustice and the Online Dispute Resolution platforms. MARCELL's resources focus on national legislation, directly related to both these DSIs.

How can CEF help you?

At the Connecting Europe Facility, we give you access to free tools, support, and funding to build your digital services. Here are some other Building Blocks you might be interested in. 




Collect data from sources and support smart decisions at the right time

A free and secure translation tool to break language barriers in the EU

Offers digital services capable of electronically identifying users from all across Europe



Last updated on  Mar 13, 2021 11:59
Share this post

Disclaimer of endorsement

The documents and information posted on this Website contain external links or pointers to information created and maintained by other public and private organizations.  These links and pointers are provided for the user’s convenience.  The CEF does not control or guarantee the accuracy, relevance, timeliness or completeness of this outside information.  The inclusion of links or pointers to particular items is not intended to reflect their importance, nor is it intended as an endorsement by the CEF of any views expressed or products or services offered on these outside sites or the organizations sponsoring the cites.  The CEF does not endorse individual vendors, products or services.  Therefore, any reference herein to any vendor, product or services by trade name, trademark, or manufacturer or otherwise does not constitute or imply the endorsement, recommendation or approval of the CEF.

Any reference in this website to any person, or organization, or activities, products, or services related to such person or organization, or any linkages from this website to the website of another party, do not constitute or imply the endorsement, recommendation, or favoring of CEF, or any of its employees or contractors acting on its behalf.

News
Event calendar
Sectors
Media library
Success stories

 

Subscribe to our newsletter

This page has no comments.