skip to main content
Newsroom

Overview    News

The LDS vision

Philippe Gelin, Head of Sector Multilingualism at DG/CONNECT, European Commission, shares with our readers some insights into the present and future of the European language data market and the role that the European Language Data space plays in its realisation.

© Babylonia Creative Affairs Bureau SRL

date:  10/10/2023

Mr. Gelin, what does the Common European Language Data Space look like in your vision?

I am convinced that the Common European Language Data Space (LDS for short) is, first of all, one of the biggest opportunities for Europe to position itself in the current race to deploy new AI-based services. It is a fantastic chance to ensure that these emerging services genuinely reflect European languages and cultures. In view of the capabilities of these tools, I can’t overstate the importance of this opportunity.

In the long term, I see the LDS as a beehive where data continuously flows in and out, enabling the creation of services “à la carte”, fully taking into account the diverse needs of European users.

Where does the LDS project stand in the context of other prominent initiatives for the collection of language resources, such as ELRC?

The LDS is a total rethinking of what a language data market should be. The context has changed with the latest and upcoming legislations and technological advancements. Of course, one has to leverage upon previous valuable language resource repositories like ELRC, but it's also important to note that ELRC was launched back in 2015. In the rapidly evolving digital landscape, a decade can feel like a century.

It’s now time to think more about versatility, security, granularity, continuous flow of data. Additionally, the scope should extend beyond text and encompass other modalities such as images and video.

What is the relationship between the LDS and the EDICs?

In a nutshell, while a Data Space primarily focuses on Data, an EDIC is a new legal tool to allow Member States and the Commission to work together on a key digitalisation project.

In our specific context, on the one hand, the Language Data Space is designed to create a marketplace around language data. On the other hand, the Alliance for Language Technology EDIC aims to establish a European ecosystem in Language Technologies. While this ecosystem starts with a focus on data and will build on the language data space, it will seek to include all the stakeholders needed to develop large language models, to fine-tuning these for specific applications, and to support their deployment in relevant markets.