Multilingual Media Monitoring and Text Analysis - Challenges for highly inflected languages

Abstract: 

The European Commission?s Europe Media Monitor (EMM) family of applications helps users monitor multilingual written online media for information on a wide variety of subject domains. Apart from gathering an average of 175,000 news articles per day in up to 73 languages and classifying them, the EMM applications apply a number of text mining and processing tools for about twenty languages. The text processing tools include news clustering, information extraction and disambiguation (persons, organisations, locations, quotations, events), matching of name variant spellings, topic detection and tracking, cross-lingual news cluster linking, opinion mining, multi-document summarisation, and more. Developing these tools is particularly challenging for highly inflected languages, such as those of the Slavic and the Finno-Ugric language families. The speaker will thus focus part of his talk on insights regarding the treatment of highly inflected languages, especially regarding information extraction and multi-label document classification. EMM is freely accessible to the public via http://emm.newsbrief.eu/overview.html.

Authors
Authors: 
STEINBERGER Ralf, PAJZS Julia, STEINBERGER Josef, EHRMANN Maud, TURCHI Marco, EBRAHIM Mohamed
Publication Year
Publication Year: 
2013
Type

Type:

Appears in Collections
Appears in Collections: 
Institute for the Protection and Security of the Citizen
Science Areas
JRC Institutes
Publisher
Publisher: 
Springer Verlag
ISBN
ISBN: 
978-3-642-40584-6
ISSN
ISSN: 
0302-9743
Citation
Citation: 
Text, Speech and Dialogue. 16th International Conference, TSD 2013. Proceedings p. 22-33