EMM: Supporting the Analyst by Turning Multilingual Text into Structured Data

All information-seeking professionals need to sieve through large amounts of text to retrieve the information they need so that they can stay up-to-date of develop-ments in their field. Language Technology tools can help make the analyst’s work more efficient by increasing the amount of data analysed and by speeding up the process. Software tools applied to big data may additionally provide a bird’s view of trends and data distributions not easily visible to the human reader. The European Commission’s Joint Research Centre (JRC) has developed the Europe Media Monitor (EMM) family of applications, which aims to provide solutions for the daily media monitoring needs of a large variety of users working in diverse fields. EMM gathers and analyses hundreds of thousands of news articles every day in up to seventy languages. Due to the large scale of the effort, EMM can track topics, detect trends and act as an early warning tool. In this chapter, we present the functionality and the benefits of EMM’s news analysis capacity, but we also aim to make the reader aware of the potential dangers of automated large-scale media monitoring. The EMM team makes available for free a number of linguistic tools and resources that can be used by information specialists to improve their own analysis of large sets of textual data.