The EU-funded project READ (Recognition and Enrichment of Archival Documents) is developing technologies that can decipher handwritten documents, making life easier for historians, archivists, genealogists and other researchers. The project’s Transkribus platform, where documents can be uploaded, transcribed and shared, has been up and running since 2015. Participants can also use their own transcriptions of historical documents to train the software to read further texts in the same handwriting.

Transkribus in figures

  • The platform grew from 2,000 users in 2015 to more than 17,000 users in 2018.
  • Every day around 3000 page images from historical documents are uploaded, processed with layout analysis, transcribed, recognised and searched.
  • In September 2018 users trained more than 140 models for reading handwriting, covering scripts and documents from more than 20 languages and alphabets, on the basis of more than 6500 transcribed pages – the equivalent of around four person-years of work, accomplished in just one month.

A meeting of minds

In November 2018, the second Transkribus user conference took place: attendees from almost 20 countries made the trip to Vienna’s Technical University. They included representatives from archives and libraries, humanities scholars, computer scientists, and members of the general public. In addition to being able to meet users personally and hear about their needs and objectives, this was also a chance for the READ project team to present their latest successes. Thanks to input from researchers at the Technical University Valencia, the University of Rostock, the Technical University Vienna and other institutions, the Transkribus software saw several improvements in the course of 2018: it is now better at analysing the layout of documents, spotting keywords, and recognising tables. At the event, a new version of the Handwritten Text Recognition module, which learns to transcribe documents with the help of existing transcriptions, was introduced: this should generate 60-80% fewer errors than the previous one. Historical documents are on the way to being as easy to read as texts produced today.

Transkribus as a European Cooperative Society

The conference also offered an opportunity to announce officially the plans of the READ team for the future of Transkribus as an ongoing, self-sustaining platform. The project team believes that, while software may come and go, data will remain. It will therefore be important to base any future governance and business models on collaboration with users and to involve them not only as data providers, but also as data owners. The concept of a European Cooperative Society (SCE) was presented to the conference attendees by the legal advisor of the Austrian Raiffeisen Association, Markus Dellinger. The SCE will give all interested parties the chance to buy shares, to collaborate, and to take decisions in a democratic way, but also to gain benefits and profit directly from the business activities of the SCE. Transkribus users reacted positively to this future perspective – and the project team is confident that the new Cooperative Society will be set up in the coming months and start regular business right after the READ project formally comes to an end.

Woudl you like to learn more about Transkribus? Take a look at this short film about how it can help researchers read centuries-old books and other texts.