Page tree

CEF DIGITAL home page

eTranslation Documentation

CEF Named-entity recognition


What is named-entity recognition?

Named-entity recognition (NER) is a tool that aims to identify (people, places and things...) in a text.

It extracts a list of the people, places or things from documents submitted to it.

This can be useful in many information processing tasks - for example, you could scrape the web and do keyword searches to find all the pages that mention "Ursula von der Leyen" in researching the EU, or perhaps, if you have a taste for controversy, "Donald Trump".

This can be useful in monitoring media interest in individuals or places that are currently in the news.

For machine translation, it can be used to identify individuals or places and ensure that the official or preferred translations are used.

This system will generate a marked-up list - what you then do with it is limited only by your imagination!


How to use the CEF Named-entity recognition web service

The CEF named-entity recognition service expects a 64 base-encoded file as input and returns a tagged xml file as output.

The CEF Named-entity recognition web service is asynchronous. This means that the client sends a NER request and is notified once the document has been processed. In this way, calling the web service does not block the client. However, the client needs to expose a callback URL which will receive a notification that the named-entity extraction has been completed. The NER web service sends the tagged xml file to the destination URL specified by the client.

The interaction is as follows:

  1. The client sends a NER request to the NER web service;
  2. The NER web service replies synchronously with the request ID (positive number) or an error code (negative number) and an error message;
  3. The NER web service processes the document;
  4. The output is sent back to the callback URL.

The above approach decouples the client from the NER server. It is the responsibility of the Client to submit the initial request, and to react to the callback from the NER server. The request ID returened can be used to correlate the original request with the callback it refers to.

Accessing the web service

CEF Named Entity Recognition is currently available as a web service only so some technical expertise is needed to use it. A relatively easy way to familiarize yourself with it is to use SoapUI.

To obtain the credentials needed to access the service contact DGT-MT@ec.europa.eu.

Use the webservice

The service is accessible via the following URL : https://language-tools.ec.europa.eu/NamedEntitiesWS/askNER

This is a RESTful Web service available with the HTTP POST method and that accepts JSON is the request body.

The structure of the JSON message to send a request is below :


{
	"credentials" : {
		"application" : "MY_APPLICATION",
		"password" : "my_password"
  	},
  	"language" : "EN",
  	"document" : {
		"content" : "VGhpcyBpcyBhIHRlc3Q=",
		"format" : "txt"
  	},
  	"callback" : "https://my_server:my_port/my_callback_endpoint"
}

Definition of parameters

  • credentials : 
    1. application : name of the client application. This parameter is used by the service to check if the application is an allowed client. Any application name needs to be authorized by the CEF service before it can be used.
    2. password : associated password for the application :

Example:

"credentials" : {
		"application" : "MY_APPLICATION",
		"password" : "password123"
}

If you do not have a dedicated application name and/or password or if you encounter a problem with access rights, please contact eTranslation.

Example : EN

  • document :
    1. content: base 64 content of the document ;
    2. format: format of the document. Accepted format are odt, ods, odp, odg, ott, ots, otp, otg, rtf, doc, docx, xls, xlsx, ppt, ppts, pdf, txt, html, xml. Type MIME are also accepted.

Example:

"document" : {
		"content" : "VGhpcyBpcyBhIHRlc3Q=",
		"format" : "txt"
  	}
"document" : {
		"content" : "............",
		"format" : "application/msword"
  	}
  • callback: This parameter specifies the URL that will be called at the end of the process when the NER has finished successfully. This URL is called with the HTTP POST method. The following parameters will be included in the POST request in the query string:
  1. requestIdthe request id ;
  2. content (optional): the base64 encoded result if the request succeed;
  3. errorCode (optional): the error code if the request fails ;
  4. errorMessage (optional): the error message if the request fails.

Please note that NER delivers a tagged xml file and will be encoded in base 64. This parameter also supports https.

Example: http://my-client-server/my-client-context/

Error code

The below list displays all error codes and their description which the client could receive synchronously or asynchronously in case the request is invalid or fails. Please note that the error code is a negative number


-10000Unknow errorSynchronously or asynchronously
-20000Language is missingSynchronously
-20001Language is not validSynchronously
-20002Callback is missingSynchronously
-20003Bad callback protocolSynchronously
-20004Application is missingSynchronously
-20005Application is invalidSynchronously
-20006Password is missingSynchronously
-20007Wrong passwordSynchronously
-20008Format is missingSynchronously
-20009Document content is missingSynchronously
-20010

The mime type is invalid

Synchronously
-30000Error in the processAsynchronously
-60000Delivery errorAsynchronously


Entities Description

Following list provides entities identified from data and their brief description


Tag

Description

Person

Person Names

NORP

Nationalities or religious or political groups

Date

Days/Calendar Dates – Absolute or relative dates

GPE

Geopolitical entity – Countries, Cities, States

FAC

Buildings, airports, highways, bridges, etc.

ORG

Companies, agencies, institutions, etc.

Work Of Art/ Art

Titles of books, songs, etc.

Time

Time Indicator

Cardinal

Numerals that do not fall under another type

Ordinal

“first”, “second”, etc.

Money

Monetary values, including unit

Event

Named hurricanes, battles, wars, sports events, etc.

Art

Artifact

Nat

Natural Phenomenon

Law

documents made into laws

LOC

Non-GPE locations, mountain ranges, bodies of water

Language

Any named Language

Quantity

Measurements, as of weight or distance

Product

Objects, vehicles, foods, etc. (Not services.)

Percent

Percentage, including”%“.