Digital Single Market
Digital Economy & Society

Language Technologies - project information

The European Digital Single market is still very much hampered by language barriers. While much has been done in this area, further research is needed to develop technologies that meet the real needs of the language industry.

Language technologies however are not just about machine translation but about extracting meaning from data, turning it into useful knowledge. Tools and services to analyse  both structured (text, documents) and unstructured data (human speech, social media content) are required in order to fully exploit the huge quantities of data available.  Funding research in this area will help advance progress.  The full range of topics covered are:

  • automated translation,
  • multilingual content authoring and management,
  • speech technology and interactive services,
  • content analytics,
  • language resources, and
  • collaborative platforms.
EU funded projects
  • A European Community of SMEs built on Environmental Digital Content and Languages

    The INSPIRE Directive 2007/2/EC, establishes an Infrastructure for Spatial Information in Europe, requiring large amounts of environmental digital content to be made accessible across Europe, resulting in a data pool that is expected to be of huge value for a myriad of value-added applications...
  • A service platform for aggregation, processing and analysis of urban and regional planning data

    Urban and Regional Planning data sets are not aggregated so far, and thus it is very difficult to use them for any other purpose than for printing of simple publishing by the authorities that they were created by...
  • Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation

    Lack of sufficient linguistic resources for many languages and domains currently is one of the major obstacle in further advancement of automated translation...
  • Annotation Resource Marketplace in the Cloud

    ANNOMARKET aims to revolutionise the text annotation market, by delivering an affordable, open market place for pay-as-you-go, cloud-based extraction resources and services, in multiple languages. This project is is driven by a commercially-dominated consortium, from 3 EU countries and with 43% of the budget assigned to SMEs.<br/>The key differentiating feature of ANNOMARKET is its open marketplace concept...
  • Applied Technology for Language-Aided CMS

    The advent of the Web revolutionized the way in which content is manipulated and delivered. As a result, digital content in various languages has become widely available on the Internet and its sheer volume and language diversity have presented an opportunity for embracing new methods and tools for content creation and distribution...
  • Automated Community Content Editing PorTal

    The use of machine translation (MT) is becoming much more pervasive. At the same time, Web 2.0 paradigms are democratising content creation - stressing the value of communities of users creating content for each other. However, right now these two trends are fairly incompatible. MT engines, even statistical engines, cannot produce acceptable results for community content due to the extreme variability within the content...
  • Baltic and Nordic Parts of the European Open Linguistic Infrastructure

    The META-NORD project aims to establish an open linguistic infrastructure in the Baltic and Nordic countries.Thus the main objectives of the META-NORD project as established under Objective 6.1 are:• provide a description of the national landscape in terms of language use; language-savvy products and services, language technologies and resources; main actors; public policies and programmes; prevailing standards and practices; current level of development, main drivers and roadblocks; and create ..
  • Bologna Translation Service

    There is a continuing and increasing need for educational institutes to provide course syllabi documentation and other educational information in English. Access to translated course syllabi and degree programmes plays a crucial role in the level to which universities effectively attract foreign students and, more importantly, has an impact on international profiling. However, to present all education information in English is a major challenge for most higher education institutes...
  • Bridges Across the Language Divide

    Today Europe is facing larger and more critical language challenges than ever before. The production of multilingual content now far outpaces our ability to translate it by human effort and we must turn to automatic methods to cope. Thus, effective and innovative alternatives must be provided to Europe's citizens and businesses...
  • CEntral and South-east europeAn Resources

    Human language technologies crucially depend on language resources and tools that are usable, useful and available...
  • Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation

    The CASMACAT project will build the next generation translator's workbench to improve productivity, quality, and work practices in the translation industry.<br/>We will carry out cognitive studies of actual unaltered translator behaviour based on key logging and eye tracking...
  • Commercially empowered Linked Open Data Ecosystems in Research

    Linked Open Data (LOD) shows enormous potential in becoming the next big evolutionary step of the WWW. However, this potential remains largely untapped due to missing usage and monetisation strategies.CODE's vision is to establish the foundation for a web-based, commercially oriented ecosystem for Linked Open Data. This ecosystem establishes a sustainable and commercial value-creation-chain among traditional (e.g. data provider and consumer) and non-traditional (e.g...
  • Computing Veracity Across Media, Languages, and Social Networks

    Social media poses three major computational challenges, dubbed by Gartner the 3Vs of big data: volume, velocity, and variety. Content analytics methods have faced additional difficulties, arising from the short, noisy, and strongly contextualised nature of social media. In order to address the 3Vs of social media, new language technologies have emerged, e.g...
  • Cross-lingual Knowledge Extraction

    The goal of the X-LIKE project is to develop technology to monitor and aggregate knowledge that is currently spread across global mainstream and social media, and to enable cross-lingual services for publishers, media monitoring and business intelligence.In terms of research contributions, the aim is to combine scientific insights from several scientific areas to contribute in the area of cross-lingual text understanding...
  • Data Supply Chains for Pools, Services and Analytics in Economics and Finance

    DOPA will enable European SMEs to become key players in the global data economy, as the impacts of the DOPA RTD activities will likely materialize in both the supply and in the demand side of B2B vertical market segments of the data related services.DOPA has been designed to provide solutions to gaps observed in the EU environment, pointing out the need to develop and implement a source and exploitation platform for economic and financial information in Europe that would altogether:- Have the ri..
  • Democratizing Fleet Management

    GPS positioning devices are becoming a commodity sensor platform with the emergence and popularity of smartphones and ubiquitous networking. While the positioning capability has been exploited in location-based services, so has its spatiotemporal cousin, tracking, so far only been considered in costly and complex fleet management applications...
  • Demonstrating the potential of a multilingual Web portal for Sustainable Agricultural & Environmental Education

    The Organic.Lingua project will extend the current Organic.Edunet portal to fill the gaps in multilingual support and cross-language resource organization and search, by significantly expanding its linguistic coverage. It particularly it aims at analyzing and re-engineering the service infrastructure and the related facilities and business models, in order to be able support more widely and effectively multilingual access and use...
  • Distant-speech Interaction for Robust Home Applications

    The DIRHA project addresses the development of voice-enabled automated home environments based on distant-speech interaction in different languages. A distributed microphone network is installed in the rooms of a house in order to monitor selectively acoustic and speech activities observable inside any space, and to eventually run a spoken dialogue session with a given user in order to implement a service or to have access to appliances and other devices...
  • Educational curriculum for the usage of Linked Data

    Linked Data has established itself as the de facto means for the publication of structured data over the Web, enjoying amazing growth in terms of the number of organizations committing to use its core principles for exposing and interlinking data sets for seamless exchange, integration, and reuse. More and more ICT ventures offer innovative data management services on top of Linked (Open) Data, creating a demand for data practitioners possessing skills and detailed knowledge in this area...
  • EUMSSI- Event Understanding through Multimodal Social Stream Interpretation

    The main objective of EUMSSI is developing technologies for identifying and aggregating data presented as unstructured information in sources of very different nature (video, image, audio, speech, text and social context), including both online (e.g., YouTube) and traditional media (e.g. audiovisual repositories), and for dealing with information of very different degrees of granularity...