The speech machines

With 23 official languages, or 253 different language pairs, it is hardly surprising that Europe is at the centre of developments in machine translation. While computers continue to make slow progress in translating texts, direct “speech to speech” translation poses another problem entirely. Here the complete language processing chain with all its subtleties is the subject of research on which a number of innovative projects are currently working.


As a consequence of European enlargement, the Commission manages the world’s biggest translation service with a budget of over € 1.1 billion a year. (1) In addition to the 1 750 translation professionals needed to meet the Union’s legislative and official needs, external resources are required for many other types of translation and publication. (2)

The private sector, including the audiovisual industry and commercial and personal services, is also a major customer of translation services, while the technology available to support these services falls far short of meeting real user expectations. The global translation market is faced with the limitations of conventional software that translates “word by word” and has shown little progress for many years now, except for a few language pairs for which more sophisticated approaches have been introduced (grammatical analyses, bilingual glossaries, etc.). Given the many grammatical and semantic ambiguities of language, machine translation remains a hazardous enterprise and the source of many errors of interpretation.

“Statistical” translation

Does the French word livre mean a book, a unit of weight or currency, or a conjugated form of the verb livrer? In English, does book mean a written or printed work or the action of reserving a seat? It is context that tells us and, to take context into account, the new approaches to machine translation are based on statistical methods. A computer is unable to “understand” as such, but by virtue of its calculating power it is able to find in an instant the best possible solution within a corpus containing millions of translated sentences.

The approach adopted by the European consortium TC-Star in developing its translation engine uses almost 3.5 million phrase pairs for English-Spanish translations and eight million pairs for Mandarin-English. “To find this best solution, the engine searches in the database for the source-translation pair that is most common statistically,” explains one of the project partners, Khalid Choukri, director of the Evaluation and Languages resources Distribution Agency (ELDA). “This research generates a tree of possible choices, which is progressively ‘trimmed’ by applying a set of rules that sorts the candidates on the basis of grammatical, syntactical and lexicographical criteria, or simply the number of words. The final result is the one that obtains the best statistical score.”

Towards a vocal revolution

The innovation developed by this ambitious project does not only concern improved engines for traditional text translation. TC-Star is also looking much further ahead and wants to develop revolutionary products for the realtime translation of the spoken language. The process is a complex one and involves advance research in the field of voice recognition and synthesis. It starts with recording  the flow of words and their segmentation to separate the sequences of words spoken from background noise and to distinguish the voices of different speakers. These word segments are then transcribed in the form of phoneme chains and subsequently decrypted using a language model and phoneme dictionary. This produces a text that passes to the translation engine. The voice synthesis module then reconstitutes the result. “The module used to do this uses a vast corpus of phoneme recordings that permits many different inton - ations and durations for a single phoneme,” continues Khalid Choukri. “A set of rules makes it possible to select from this the most appropriate word segment, taking into account punctuation and information supplied by the voice recognition module (hesitations, false starts, ungrammatical locutions, etc.). The final result is thus an artificial voice that is fluid, expressive and respects the characteristics of the source speaker.”

Ultimately, the voice synthesis system could also be pre-programmed with the voice characteristics of a given person. The voice rendition would then be as close as possible to the original in terms of intonation, inflection and diction. Although TC-Star is most interested in supplying the European authorities with an effective speech translation system, it is not ignoring the many possibilities for applications of benefit to the general public. These include the translation of television programmes, incorporation in telephony and even the creation of small “portable translators”. The very optimistic Khalid Choukri hopes that these technologies will start to spread within five to ten years.

François Rebufat

  1. Figure cited by Karl-Johan Lönnroth, Director-General of the Translation DG – see Human Language technologies for Europe - www.tc-star.org/pubblicazioni/ITC_francese.pdf
  2. In this respect, the EuroMatrix inter-university project has just started up and will be looking specifically at the language needs of the enlarged Europe and play the role of observatory of progress in machine translation applied to the 23 EU languages. www.euromatrix.net


Read More

“Self-communicating” objects

One of the fields of application in which language technologies are progressing the most is that of domestic appliances operated by voice commands. The European project Talk (Tools for Ambient Linguistic Knowledge) is developing systems for the recognition of voice commands adapted to everyday life. Like the TCStar research, the principal challenge is to identify words according to the way they are pronounced by different people and then to provide a translation in the form of commands that can be acted upon. For the object in question, Talk refers to a semantic dictionary in which each entry corresponds to an action to be carried out.

When analysing the sentence spoken, it identifies the semantic structures it recognises and carries out the corresponding actions An interactive programme of this kind is capable of learning, as it is the user who vocally introduces his own semantic rules that enable the machine to refine its pertinence criteria. Looking further to the future, the project ECAgents (Embodied and Communicating Agents) is interested in developing structures for “communication between electronic agents” enabling them to interact directly with their environment and communicate between themselves or with people. The very ambitious objective is to extend these functionalities to contemporary devices (telephones, wireless connections, household robots, etc.) to create “self-communicating tools”. Innovations that must be careful not to create a cacophony of voices.…


To find out more