Why should I share data in the Language Data Space?
date: 13/11/2023
Over the last few years, the importance of AI-based tools has grown significantly, leading to considerable changes in our daily lives and contributing to new trends and a greater availability and uptake of language-centric AI in general (e.g., chatGPT, Bing, or speech recognition like in Siri or Alexa).
For the creation of state-of-the-art Language Technology (LT) applications, language data plays a crucial role. Many organisations, however, face challenges in collecting the necessary amount of language data required for the development of competitive language-centric AI. Data sharing can thus be a solution and is increasingly considered as the best way towards a truly sustainable language data management, fostering both research and innovation.
Within this context, the Common European Data Spaces are becoming more and more relevant, as they can ensure that more data becomes available for use in economy, society and research, while the companies and individuals generating the data retain control over it.
A marketplace for language data
The European Language Data Space (LDS) has the objective to create a genuine single European market for and around language resources. Unlike past initiatives like ELRC, the LDS will not only offer a free share and exchange platform for research, but also grant stakeholders easy access to (free or commercial) high-quality data. Additionally, it will enable them to monetise their own data, facilitate the exchange of best practices and insights with other stakeholders and support finding new partners or collaboration opportunities through a single, EU-compliant platform.
But why should stakeholders, such as those from the publishing, LT or press industry, share business-relevant data with potential competitors?
Often, companies don’t realise the value of the language resources they produce (e.g., translations of fiction and non-fiction) on a daily basis, especially if these don’t belong to their core business. When sharing the data in the LDS, the data owners can maximise the revenues from these resources – but continue to be in control over them. Thus, the LDS will not interfere with the primary business of the companies, but will rather breathe new life into such data sets and reward the efforts invested in their creation by giving the owners the opportunity to re-sell them under different conditions.
The LDS will mark a true turning point in the approach to the collection of language resources. The increased availability of high-quality data will empower European businesses to compete globally with the data-driven language services provided by US or Chinese companies, fostering trust throughout the whole language data sharing process.