Digital information comes in myriad forms and formats, so how can we make sure that today's data is not lost forever in inaccessible legacy forms? An EU-funded project has developed a framework that ensures we will always be able to use data, however and wherever it is saved and stored.
Do you hate it when you try to open a computer file and up pops a box which says 'Invalid file format'? Or when vendors of software and operating systems announce they will no longer support and update legacy systems?
Data formats, ICT hardware, software and protocols are constantly evolving. But even as we gather and manipulate so much data, could it be lost just because its format is old or hardware has changed? The prosperity of future generations relies on their access to the information of the past. Are our descendants at risk of knowing nothing because everything we know today is locked in computer systems and codes that one day may be impossible to crack?
A framework for development
The EU-funded SHAMAN (1) project has developed a framework that makes 'digital preservation' (DP) a reality for virtually any data format. People will be able to store and archive digital objects and information in confidence, knowing that they will be fully accessible and useable in the future, whatever the future brings. 'SHAMAN has developed new technologies which could enable us to communicate with the future, securing the valuable digital we are creating today. They will be readable, accessible and usable for future generations, ' remarks Ruben Riestra, the project's coordinator.
'Contrary to the general idea that "digital lasts forever", the risks of losing digital content related to obsolescence of hardware and software should not be underestimated, and can create considerable damage to valuable information assets,' he continues. 'The fast pace of development of digital information isn't necessarily reflected in other areas; for example the Airbus A380 will be operational for some decades and therefore the digital documentation of aeroplane maintenance must be stored, secured and easily accessed for perhaps the next 40 to 50 years.'
A holistic solution
The 'SHAMAN reference architecture' (SRA) provides a unified view of digital preservation, approaching the problem from a holistic perspective. The SRA enables digital preservation to be integrated seamlessly into the overall architecture of an organisation.
The development of the framework required the project partners to explore current DP practices and create an architecture that was not limited to a specific DP methodology. By looking at the specific concerns expressed by those using current digital preservation methods in a variety of organisations, the SHAMAN team was able to produce a solution which makes this straightforward for organisations for whom preserving content is not necessarily their primary business requirement, but is nevertheless important for future success.
The SHAMAN framework includes tools for analysing, managing, accessing and reusing information objects and data across various libraries and archives. It supports the preservation of information, and the specific applications and services that could be applied to the data, all in ways that future technologies and systems will be able to understand and execute.
For example, if you want to preserve animation files you cannot simply store a video; it is also important that you can apply or reapply post-production processes (such as colour transformations) to individual frames of the video. The SHAMAN reference architecture makes this possible.
Beyond the cloud
The architecture also goes beyond simple cloud storage solutions. As Mr Riestra explains, 'the most important difference is the time frame: storage in the cloud is mainly for short term, while digital preservation is related to issues such as multiple migrations over time, hardware and mainly software obsolescence.'
The cloud currently consists of a wide variety of services, solutions, platforms and technologies, all in their early stages. 'Putting digital content on the cloud is still a risk,' Mr Riestra suggests. 'Instead we need robust, long-term solutions that can secure data and metadata in many formats for the future.'
The SHAMAN project produced three prototype applications, designed to demonstrate the validity of the framework and to showcase some exemplary tools developed using the reference architecture.
The first prototype, developed in association with the German National Library, successfully demonstrated that the complete digital lifecycle (creation, assembly, archival, adoption and reuse) could be applied to books and associated material (such as documents, slides and videos). The project showed how complete information, including structural metadata and data on the creation context, could be applied. Furthermore, certain archival functionality, such as migration of image formats from TIFF to JPEG, could be carried out with high-quality assurance.
The second prototype applied the SHAMAN framework's concepts to industry, aiming to increase efficiency, ensure legal compliance and improving back up times. 'In industry digital preservation is still not widely accepted and has different requirements, often driven by legislative rules. Trust, authenticity and access rights also shape the solutions currently on offer and in development,' explains Mr Riestra. 'The prototype demonstrated successful integration of a product lifecycle management system, with digital preservation built in, based on a consumer electronics test case.'
The third prototype looked at the science sector, an area that continuously generates and needs to manage a large volume of data. 'We set up three scenarios; we wanted to demonstrate how it was possible to capture and preserve sensor data from civil engineering (dam safety), scientific workflows and experimental data in particle physics,' says Mr Riestra. 'In these areas a large amount of complicated data in many different formats needs to be stored, managed and reused and the SHAMAN infrastructure was able to deal with this effectively, further demonstrating its flexibility.'
The SHAMAN project has already contributed to preserving digital information for future reference by a variety of organisations, including universities, engineering firms, technology spin-outs and national libraries. By developing a holistic solution, designed to provide multiple benefits to organisations, SHAMAN has developed an architecture which makes DP easy and attractive. Hopefully those annoying error boxes will be a thing of the past (and probably one of the few things we may not want to preserve).
The SHAMAN project received EUR 8.4 million (of total EUR 12.29 million project budget) in research funding under the EU's Seventh Framework Programme (FP7), in the 'Digital libraries and technology-enhanced learning' area.
(1) 'Sustaining heritage access through multivalent archiving'
- 'Sustaining heritage access through multivalent archiving' website
- SHAMAN project factsheet on CORDIS