Projects :: LetsMT!
LetsMT!: Platform for Online Sharing of Training Data and Building User Tailored MT
Exploiting the potential of SMT technologies
In recent years, statistical machine translation (SMT) has become the leading paradigm for machine translation. SMT systems are built by analyzing huge volumes of parallel corpus and learning translation models from this data. The quality of SMT systems largely depends on the size of training data. Since the majority of parallel data is in major languages, SMT systems for larger languages are of much better quality compared to systems for smaller languages. This quality gap is further deepened due to the complex linguistic structure of many smaller languages. Languages like Latvian, Lithuanian and Croatian (to name just a few) have complex morphological structure and free word order. To learn this complexity from corpus data, much larger volumes of training data are needed. Current systems are built on the data accessible on the web, but it is just a fraction of all parallel texts. Most of them still reside in the local systems of different corporations, public and private institutions, and desktops of individual users. The cost and the know-how required for building custom MT solutions deter many small-to-medium companies from utilizing the power of MT technologies. To fully exploit the huge potential of existing open SMT technologies we propose to build an innovative online collaborative platform for data sharing and MT building. This platform will support upload of public as well as proprietary MT training data and building of multiple MT systems, public or proprietary, by combining and prioritizing this data. The project will extend the use of existing state-of-the-art SMT methods that will be applied to data supplied by users to increase quality, scope and language coverage of machine translation. LetsMT! services will be focused on two application scenarios – the free online translation of business and financial news and the application in the localisation and translation industry. At the same time, it will be of interest for a variety of users: web users in general, speakers of less-covered languages, academia, etc. For the localisation and translation industry, LetsMT! will provide facilities for training of SMT systems on their data and generating custom SMT solutions to be used by localisation service providers as well as enterprises and organizations with multilingual translation needs. Integration of SMT solutions in professional productivity environments will be provided. For readers of business and financial news, LetsMT! will provide free and instant MT services with emphasis on less-covered languages. Their quality will be ensured by application of a large pool of domain-specific resources and subsequent evaluation cycles. LetsMT! services will be accessible through the Web portal for free translation of texts, through a translation widget provided for inclusion in a web-page, through browser plug-ins for quick access to translation, and through integration in professional translation tools.
VIENIBAS GATVE 75 A
REPUBLIC OF LATVIA
Coordinator: TILDE SIA, REPUBLIC OF LATVIA
|TILDE SIA||REPUBLIC OF LATVIA|
|MORAVIA IT AS||CZECH REPUBLIC|
|SVEUCILISTE U ZAGREBU FILOZOFSKI FAKULTET - UNIVERSITY OF ZAGREB, FACULTY OF HUMANITIES AND SOCIAL SCIENCES||REPUBLIC OF CROATIA|
|ZOOROBOTICS BV||THE NETHERLANDS|
|THE UNIVERSITY OF EDINBURGH||UNITED KINGDOM|
Last update: 13/12/2011