Making less common EU languages more accessible

An EU-funded project has developed a cost-efficient, high-quality machine translation tool for less widely spoken European languages such as Croatian, facilitating communication and helping smaller companies enter new markets.

Countries
Countries
  Algeria
  Argentina
  Australia
  Austria
  Bangladesh
  Belarus
  Belgium
  Benin
  Bolivia
  Botswana
  Brazil
  Bulgaria
  Burkina Faso
  Cambodia
  Cameroon
  Canada
  Cape Verde
  Chile
  China
  Colombia
  Costa Rica
  Croatia
  Cyprus
  Czech Republic
  Denmark
  Ecuador
  Egypt
  Estonia
  Ethiopia
  Faroe Islands
  Finland
  France
  French Polynesia
  Gambia
  Georgia

Countries
Countries
  Algeria
  Argentina
  Australia
  Austria
  Bangladesh
  Belarus
  Belgium
  Benin
  Bolivia
  Botswana
  Brazil
  Bulgaria
  Burkina Faso
  Cambodia
  Cameroon
  Canada
  Cape Verde
  Chile
  China
  Colombia
  Costa Rica
  Croatia
  Cyprus
  Czech Republic
  Denmark
  Ecuador
  Egypt
  Estonia
  Ethiopia
  Faroe Islands
  Finland
  France
  French Polynesia
  Gambia
  Georgia


  Infocentre

Published: 16 April 2018  
Related theme(s) and subtheme(s)
Human resources & mobilityMarie Curie Actions
Information society
Innovation
Research policySeventh Framework Programme
Countries involved in the project described in the article
Croatia  |  Greece  |  Ireland  |  Spain
Add to PDF "basket"

Making less common EU languages more accessible

Image

© Olivier Le Moal - fotolia.com

With 24 official languages and a range of regional ones, communicating in the European Union can sometimes be a challenge. This is particularly true for less common languages such as Croatian.

For these languages, where the required resources to develop a modern machine translation – or MT – system are scarce, one efficient solution is to build such software automatically using free online material.  

To help fill this gap, the EU-funded ABU-MATRAN project set out to develop a cost-efficient, high-quality and web-based MT tool.

“Although current MT approaches work in any language, they first need access to such resources as vast amounts of sentences in both the source and target languages,” says Antonio Toral, formerly with ABU-MATRAN project coordinator Dublin City University in Ireland and now at the University of Groningen in the Netherlands. “For Europe’s under-resourced languages, these necessary resources may not exist, and acquiring them by manual translation would be too costly.”

New language, new system

The idea was born when Croatia officially joined the EU in 2013 – bringing with it a new official language. At this time, ABU-MATRAN researchers developed an online MT system for English-Croatian based on publicly available resources.

The system uses a set of acquisition tools that allow the MT to automatically gather data from different types of resources such as dictionaries. To do this, it primarily deploys web crawlers that pull the information from the internet.

“The ABU-MATRAN system was the first translator for these languages based on free, open-source technologies and immediately helped reduce the time and costs associated with translation between the two languages,” says Toral. “Using only datasets acquired in the project and publicly available MT machinery, we successfully built a system that rivals those of large IT corporations for this language pair.”

The project consortium, consisting of partners from industry and academia, started by identifying existing research tools not yet ready to be put on the market.

“We then worked together to identify industry needs, improve existing tools and prepare them for commercialisation,” says Toral. “In doing so, we also identified new needs that led to new research and solutions for addressing these needs.”

More languages added

This initial English-Croatian MT system was gradually improved by implementing new, more efficient translation techniques such as Neural Machine Translation (NMT). The project also developed a unique Croatian MT system for tourism.

From there, the project began to prepare for commercialisation, continuing to add other South Slavic languages, including Bosnian, Serbian and Slovenian. The techniques developed for these were then applied to other language pairs – English-Finnish, Spanish-Catalan and Spanish-Basque.

“By expanding to these other European languages, we demonstrated that the ABU-MATRAN system is applicable to very different language types,” says Toral. “Most importantly, the tools and techniques developed in this project have drastically reduced the cost of developing required language resources, thus lowering the barriers for SMEs to enter new markets.”

All results are publicly available under free/open-source licenses.

ABU-MATRAN received funding through the EU’s Marie Skłodowska-Curie actions programme.

Project details

  • Project acronym: ABU-MATRAN
  • Participants: Ireland , Spain, Croatia, Greece
  • Project N°: 324414
  • Total costs: € 1 045 965
  • EU contribution: € 1 045 965
  • Duration: From January 2013 to December 2016

See also

 

Convert article(s) to PDF

No article selected


loading


Search articles

Notes:
To restrict search results to articles in the Information Centre, i.e. this site, use this search box rather than the one at the top of the page.

After searching, you can expand the results to include the whole Research and Innovation web site, or another section of it, or all Europa, afterwards without searching again.

Please note that new content may take a few days to be indexed by the search engine and therefore to appear in the results.

Print Version
Share this article
See also
Project website
Project details