EU Science Hub

JRC-Names

 

What is JRC-Names?

JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). Since March 2016, JRC-Names has also been available as linked data, including additional information such as frequencies per language, titles found with the entities, and date ranges.

 

What can JRC-Names be used for?

JRC-Names is a technical resource that can be used to find names even if they are spelled differently, but it is also a useful ingredient for IT systems that process text, e.g. for text mining.

 

How was JRC-Names produced?

JRC-Names is a by-product of the analysis of about 220,000 news reports per day by the Europe Media Monitor (EMM) family of applications.

 

Statistics on JRC-Names

JRC-Names contains the most important names of the EMM name database, i.e. those names that were found frequently or that were verified manually or found on Wikipedia.

 

Related information

A description of JRC-Names (version 1) was published in the publication below. Information on the Linked Data version of JRC-Names can be found in the second paper. Please use these publications as a reference when you refer to JRC-Names:

 

Usage conditions

By downloading and/or using JRC-Names, you agree to the usage conditions formulated in the licence, which is available at http://optima.jrc.it/Resources/LICENCE-EULA_JRC-Names_2011.pdf.

 

Privacy statement

JRC-Names is subject to a privacy statement.

 

Download JRC-Names

Depending on your needs, you may want to download part or all of the following components:

  • JRC-Names Java demonstrator code: This .jar file allows to analyse UTF8-encoded text files to recognise known named entities. It also allows to generate a list of all known variants for any input name; Needs to be used in combination with the entity resource file.
  • JRC-Names named entity resource file: This file contains the list of names and their variants. It is planned that this file will be updated daily in order to include the most recently added entity names. (filename: entities.gzip; zipped size: ca. 5.6MB; unzipped: ca. 18MB).
  • JRC-Names Java source code: You only need this if you want to integrate the resource into your own environment.
  • JRC-Names documentation: This is the documentation for the Java software.
  • JRC-Names linked data version access on the EU's Open Data portal, including as an RDF file.