EU Science Hub

JRC-Acquis

 

What are the Acquis Communautaire and the JRC-Acquis?

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 official languages. The Acquis Communautaire texts exist in these languages, although Irish translations are not currently available. The Acquis Communautaire thus is a collection of parallel texts in the following 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian and Swedish.

The data release by the JRC is in line with the general effort of the European Commission to support multilingualism, language diversity and the re-use of Commission information.

The JRC did not receive an authoritative list of documents that belong to the Acquis Communautaire. In order to compile the document collection distributed here, we selected all those CELEX documents (see below) that were available in at least ten of the twenty EU-25 languages (the official languages of the EU before Bulgaria and Romania joined in 2007) and that additionally existed in at least three of the nine languages that became official languages with the Enlargement of the EU in 2004 (i.e. Czech, Estonian, Hungarian, Lithuanian, Latvian, Maltese, Polish, Slovak and Slovenian). The collection distributed here is thus an approximation of the Acquis Communautaire which we call the JRC-Acquis. The JRC-Acquis must not be seen as a legal reference corpus. Instead, the purpose of the JRC-Acquis is to provide a large parallel corpus of documents for (computational) linguistics research purposes.

 

Statistics for version 3.0 of the JRC-Acquis corpus

The JRC-Acquis corpus (version 3.0) is currently available in 22 languages with the following distribution:

What is the difference between the JRC-Acquis and the other EU corpora?

JRC-Acquis, DGT-Acquis and DCEP are corpora consisting of full texts with additional information on which sentences are aligned with which others, while the Translation Memories DGT-TM, EAC-TM and ECDC-TM are collections of translation units (mostly sentences), from which the full text cannot be reproduced. Some of the resources overlap, while others are entirely different. JRC-Acquis documents additionally are acompanied by information on the manually assigned Eurovoc subject domain classes so that the JRC-Acquis can also be used to train automatic multi-label classification software.

For details and background information on each of the multilingual resources, read the overview article An overview of the European Union's highly multilingual parallel corpora.

 

Usage conditions / Licensing issues

I. Intellectual property and conditions of use of data

The JRC-Acquis data is the exclusive property of the European Commission. The Commission cedes its non-exclusive rights free of charge and world-wide for the entire duration of the protection of those rights to the re-user, for all kinds of use which comply with the conditions laid down in the Commission Decision of 12 December 2011 on the re-use of Commission documents, published in Official Journal of the European Union L330 of 14 December 2011, pages 39 to 42.

Download the JRC-Acquis corpus

AC Corpus - version 3.0 (by language)

AC aligned corpus using Vanilla aligner

AC aligned corpus using HunAlign

By downloading these resources, you agree to the usage conditions.

Previous version: JRC-ACQUIS Multilingual Parallel Corpus, Version 2.2.

Click here to see a history of changes regarding the preparation of this corpus.

 

Acknowledgement / Reference publication

A description of the JRC-Acquis corpus (version 2.2) was published in the paper below. Please use this reference publication when referring to the JRC-Acquis.

 

To compare JRC-Acquis with the other linguistic resources distributed by EU institutions, see: