Navigation path

Countries
Countries
  Argentina
  Australia
  Austria
  Belarus
  Belgium
  Benin
  Bolivia
  Botswana
  Brazil
  Bulgaria
  Burkina Faso
  Cameroon
  Canada
  Chile
  China
  Colombia
  Croatia
  Cyprus
  Czech Republic
  Denmark
  Egypt
  Estonia
  Ethiopia
  Finland
  France
  Gambia
  Georgia
  Germany
  Ghana
  Greece
  Hungary
  Iceland
  India
  Indonesia
  Ireland
  Israel
  Italy
  Jamaica
  Japan
  Kazakhstan
  Kenya
  Korea
  Latvia
  Lichtenstein
  Lithuania
  Luxembourg
  Madagascar
  Malaysia
  Malta
  Mexico
  Montenegro
  Morocco
  Mozambique
  Namibia
  Netherlands
  New Zealand
  Nigeria
  Norway
  Panama
  Peru
  Poland
  Portugal
  Romania
  Russia
  Senegal
  Serbia
  Slovakia
  Slovenia
  South Africa
  Spain
  Sri Lanka
  Swaziland
  Sweden
  Switzerland
  Taiwan
  Tanzania
  Thailand
  Tunisia
  Turkey
  Uganda
  Ukraine
  United Kingdom
  United States
  Vietnam

Themes
Agriculture & food
Energy
Environment
ERA-NET
Health & life sciences
Human resources & mobility
Industrial research
Information society
Innovation
International cooperation
Nanotechnology
Pure sciences
Research infrastructures
Research policy
Science & business
Science in society
Security
SMEs
Social sciences and humanities
Space
Special Collections
Transport


   Countries

Last Update: 16-04-2014  
Related category(ies):
Information society  |  Success stories

 

Countries involved in the project described in the article:
Bulgaria  |  Germany  |  Israel  |  Netherlands  |  Poland  |  Slovenia  |  United Kingdom
Add to PDF "basket"

Digitising the past

Although millions of books are scanned and put online every year, making old documents and texts available on the web is a difficult and painstaking process.

Photo of a man and a woman in a library
Video in QuickTime format:  ar  de  en  es  fa  fr  it  pt  ru  tr  uk  (10.5 MB)

Project IMPACT – which stands for Improving Access to Text - is focused on the making the process easier.

Project IMPACT director Hildelies Balk explained: “The problem with turning an historic document into a machine readable text is that it is so very old, everything is different from a modern document, it has old fonts, old words and a very difficult layout.“

Once scanned they are left full of errors, because computers struggle to read old texts with strange layouts, fonts and spellings.

Clemens Neudecker, technical manager for European projects at Koninklijke Bibliotheek, showed us one example: “This is the Principia Mathematica by Isaac Newton. You see actually what we call shine through, that is ink from the opposite page which is just shining through the paper, you see that the paper is warped, and you can also see here there is this long ‘s’ also in use, which can very easily be confused with an ‘f’.”

Researchers at the National Library of the Netherlands have spent four years in a European project to improve software tools to read old books.

Researcher Hildelies Balk said: “We improved software for image enhancement, optical character recognition, post-correction of the document and language technology to make it more accessible.“

That know-how has already been integrated into the market-leader digitisation software – and the results are much improved.

Clemens Neudecker talked us through one project: “Here we have an example of the image being straightened. And the next thing is that these borders also need to be cropped. The next step is to transform that into a black and white image in order to enhance the contrast background and foreground.

“At the very end of the process the user gets the recognised full text, and there’s also the structural features of this text - for example paragraphs, headlines and the like are also detected.“

The project claims at least a 15 percent improvement in the accuracy of scanned text.

It means precious archives should be much more available.

Hildelies Balk concluded: “Text that is not fully digital, it is virtually invisible. Everyone is used to going into a search engine, and looking for a word, and if they don’t find this it basically isn’t there for them.”

Project details

  • Project acronym:IMPACT
  • Participants:Netherlands (Coordinator), Poland, UK, Slovenia, Bulgaria, Israel, Germany
  • FP7 Project N° 215064
  • Total costs: €15 503 509
  • EU contribution: €11 500 000
  • Duration:January 2008 - June 2012

Convert article(s) to PDF

No article selected


loading


Search articles

Notes:
To restrict search results to articles in the Information Centre, i.e. this site, use this search box rather than the one at the top of the page.

After searching, you can expand the results to include the whole Research and Innovation web site, or another section of it, or all Europa, afterwards without searching again.

Please note that new content may take a few days to be indexed by the search engine and therefore to appear in the results.

Print Version
Share this article
See also

Futuris, the European research programme - on Euronews. The video on this page was prepared in collaboration with Euronews for the Futuris programme.

Project web site

Project information on CORDIS

Contacts
Unit A1 - External & internal communication,
Directorate-General for Research & Innovation,
European Commission
Tel : +32 2 298 45 40
  Top   Research Information Center