CEF Named-entity recognition
What is named-entity recognition?
Named-entity recognition (NER) is a tool that aims to identify (people, places and things...) in a text.
It extracts a list of the people, places or things from documents submitted to it.
This can be useful in many information processing tasks - for example, you could scrape the web and do keyword searches to find all the pages that mention "Ursula von der Leyen" in researching the EU, or perhaps, if you have a taste for controversy, "Donald Trump".
This can be useful in monitoring media interest in individuals or places that are currently in the news.
For machine translation, it can be used to identify individuals or places and ensure that the official or preferred translations are used.
This system will generate a marked-up list - what you then do with it is limited only by your imagination!
How to use the CEF Named-entity recognition web service
The CEF named-entity recognition service expects a 64 base-encoded file as input and returns a tagged xml file as output.
The CEF Named-entity recognition web service is asynchronous. This means that the client sends a NER request and is notified once the document has been processed. In this way, calling the web service does not block the client. However, the client needs to expose a callback URL which will receive a notification that the named-entity extraction has been completed. The NER web service sends the tagged xml file to the destination URL specified by the client.
The interaction is as follows:
- The client sends a NER request to the NER web service;
- The NER web service replies synchronously with the request ID (positive number) or an error code (negative number) and an error message;
- The NER web service processes the document;
- The output is sent back to the callback URL.
The above approach decouples the client from the NER server. It is the responsibility of the Client to submit the initial request, and to react to the callback from the NER server. The request ID returened can be used to correlate the original request with the callback it refers to.
Accessing the web service
CEF Named Entity Recognition is currently available as a web service only so some technical expertise is needed to use it. A relatively easy way to familiarize yourself with it is to use SoapUI.
To obtain the credentials needed to access the service contact DGT-MT@ec.europa.eu.
Use the webservice
The service is accessible via the following URL : https://language-tools.ec.europa.eu/NamedEntitiesWS/askNER
This is a RESTful Web service available with the HTTP POST method and that accepts JSON is the request body.
The structure of the JSON message to send a request is below :
Definition of parameters
- credentials :
- application : name of the client application. This parameter is used by the service to check if the application is an allowed client. Any application name needs to be authorized by the CEF service before it can be used.
- password : associated password for the application :
If you do not have a dedicated application name and/or password or if you encounter a problem with access rights, please contact eTranslation.
- language : Original language ISO 639-1 language code for the document to process.
Example : EN
- document :
- content: base 64 content of the document ;
- format: format of the document. Accepted format are odt, ods, odp, odg, ott, ots, otp, otg, rtf, doc, docx, xls, xlsx, ppt, ppts, pdf, txt, html, xml. Type MIME are also accepted.
- callback: This parameter specifies the URL that will be called at the end of the process when the NER has finished successfully. This URL is called with the HTTP POST method. The following parameters will be included in the POST request in the query string:
- requestId: the request id ;
- content (optional): the base64 encoded result if the request succeed;
- errorCode (optional): the error code if the request fails ;
- errorMessage (optional): the error message if the request fails.
Please note that NER delivers a tagged xml file and will be encoded in base 64. This parameter also supports https.
The below list displays all error codes and their description which the client could receive synchronously or asynchronously in case the request is invalid or fails. Please note that the error code is a negative number.
|-10000||Unknow error||Synchronously or asynchronously|
|-20000||Language is missing||Synchronously|
|-20001||Language is not valid||Synchronously|
|-20002||Callback is missing||Synchronously|
|-20003||Bad callback protocol||Synchronously|
|-20004||Application is missing||Synchronously|
|-20005||Application is invalid||Synchronously|
|-20006||Password is missing||Synchronously|
|-20008||Format is missing||Synchronously|
|-20009||Document content is missing||Synchronously|
The mime type is invalid
|-30000||Error in the process||Asynchronously|
Following list provides entities identified from data and their brief description
Nationalities or religious or political groups
Days/Calendar Dates – Absolute or relative dates
Geopolitical entity – Countries, Cities, States
Buildings, airports, highways, bridges, etc.
Companies, agencies, institutions, etc.
Work Of Art/ Art
Titles of books, songs, etc.
Numerals that do not fall under another type
“first”, “second”, etc.
Monetary values, including unit
Named hurricanes, battles, wars, sports events, etc.
documents made into laws
Non-GPE locations, mountain ranges, bodies of water
Any named Language
Measurements, as of weight or distance
Objects, vehicles, foods, etc. (Not services.)