Page tree

CEF DIGITAL home page

eArchiving Documentation



Use eArchiving

eArchiving is arguably the most complex of the CEF Building Blocks and this "Use eArchiving" section provides an in-depth, comprehensive guide to the whole eArchiving end-to-end workflow, with use cases as examples.

The aim of eArchiving is to provide the core specifications, software, training and knowledge to help data creators, software developers and digital archives tackle the challenge of short, medium and long-term data management and reuse in a sustainable, authentic, cost-efficient, manageable and interoperable way. The core of eArchiving is formed by Information Package specifications which describe a common format for storing bulk data and metadata in a platform-independent, authentic and long-term understandable way. The specifications are ideal for migrating long-term valuable data between generations of information systems, transferring data to dedicated long-term repositories (i.e. digital archives), or preserving and reusing data over extended (and shorter) periods of time and generations of software systems. Next to the specifications eArchiving offers a set of sample software to demonstrate the format in different scenarios and business environments, and consultancy in regard to long-term digital preservation risks and their mitigation.  

How to get started?

Using eArchiving starts with mapping your digital preservation problem to the eArchiving format specification and tool portfolio. That is to select the right format specifications and tools that best address your problem. Finding the right eArchiving components is not always easy. You have to understand the logic behind the eArchiving elements and have some knowledge of the eArchiving use cases, specifications and tools.

This section aims to help newcomers to digital archiving or to the eArchiving Building Block find the best solution. We guide you through the eArchiving concepts, approaches and elements:

  1. The OAIS Reference Model of a digital archive, information package and process concept.
  2. E-ARK uses cases and processes
  3. Understanding eArchiving specifications and tools
  4. Finding solutions to your digital archiving problems   

Main Standards and references

The eArchiving specifications are based on common, international standards for transmitting, describing and preserving digital data. The main standard is the Reference model for an Open Archival Information System (OAIS) (OAIS Reference model) which has Information Packages as its basis. The main standard for transmitting Information Packages is the Metadata Encoding and Transmission Standard (METS), and the main standard for preserving Information Packages is Preservation Metadata Implementation Strategies (PREMIS). 

OAIS Reference Model

The conceptual starting point of the information package specifications, use cases and process of the E-ARK project was the Open Archival Information System (OAIS) Reference Model (https://public.ccsds.org/pubs/650x0m2.pdf). The OAIS Reference Model is designed as a conceptual framework of a digital archive. The model defines three types of information packages and a set of electronic archival processes.


OAIS Functional Entities (source: public.ccsds.org)


An information package, according to the OAIS model, contains the archival content along with descriptive and technical metadata. The three information package types are: 

  • Submission Information Package (SIP), i.e. the input of the archive,
  • Dissemination Information Package (DIP), the output of the archive and
  • Archival Information Package (AIP), the internal format managed by the archive during long-term preservation.


The processes of an OAIS archive are:

  • Ingest
  • Archival Storage
  • Preservation Planning
  • Data Management
  • Access
  • Administration


The above list is often extended with a Pre-Ingest process. Pre-Ingest covers the data and metadata assessment and compilation into the Submission Information Package. The Pre-Ingest process is usually performed by the data producer institution (Producer).


E-ARK uses cases and processes

In the scope of the E-ARK project (a predecessor of the eArchiving Building Block running in 2014-17) the E-ARK team has

  • identified the E-ARK use cases and detailed the related OAIS processes,
  • developed a set of format specifications (including a detailed structure for all three types of OAIS information packages),
  • and developed or modified a set of tools to process the information packages.


Use cases identified by the E-ARK project

  • Pre-Ingest and Ingest use cases
    • Export and ingest relational database(s) based on SIARD
    • Export and ingest electronic records based on MoReq2010
    • Package and ingest simple files from a file system
    • Package and ingest geodata related to other digital content in the package 
  • Access use cases
    • Access relational database(s) based on SIARD
    • Access relational database(s) via SOLR (not SQL)
    • Access single electronic records/files (ingested from an ERMS or from a file system)
    • Access data via OLAP (data cube) technology
    • Access geodata re related to other digital content in the package


Understanding eArchiving specifications and tools

The following tables show the digital archiving components resulting from the E-ARK project layered according to the OAIS processes. The columns of the (source and intermediate) formats are left white while the columns containing the tools – performing the transition from one format to the other – are drawn in amber.

Pre-Ingest and Ingest

Data Source

Export tool

Content type format

SIP creation tool

Submission Information Package

Ingest tool

Archival Information Package

Archival Repository

Database

DBVTK

SIARD 2.0

RODA-In

ESS ETP

SIP Creator

(E-ARK Web)

E-ARK SIP

RODA

ESS ETA

SIP2AIP Converter

(E-ARK Web)

E-ARK AIP

RODA Repository

ESS Preservation Platform

HDFS Storage

SOLR Index

(E-ARK Web)

ERMS

ERMS export module

ERMS content type

Files



Geodata

QGIS*

Geodata content type



 
*QGIS is not an E-ARK product. Some freely available and (almost) industry standard tools were integrated into and tested together with the E-ARK toolset during some pilot scenarios in the E-ARK project. 

Access

Archival Repository

Archival Information Package

Search and Order tools

DIP creation tool

Dissemination Information Package

Viewer

Output Format

RODA Repository

ESS Preservation Platform

HDFS Storage

SOLR Index

(E-ARK Web)

E-ARK AIP

Search & Display

Order Management Tool

Lily Ingest*

E-ARK Web Search

RODA

ESS EPP

AIP2DIP Converter

(E-ARK Web)

E-ARK DIP

DBVTK

Relational Database

SOLR

SOLR Database

CMIS Portal Viewer

ERMS record

IP Viewer

Simple files

OLAP* Viewer

OLAP Data

QGIS*, Peripleo*

Geodata

*QGIS, Peripleo, Lily Ingest, Oracle OLAP are not E-ARK products. Some freely available and (almost) industry standard tools were integrated into and tested together with the E-ARK toolset during some pilot scenarios in the E-ARK project.


As the above tables show, the format specifications indicate the connection points between the processing steps as the process progresses. If the format specifications can be standardized, they automatically bring compatibility between the consecutive process steps. That is exactly the reason why detailed format specifications were desperately needed. The OAIS model doesn’t specify the internal structure of the information packages. One of the main goals of the E-ARK project was to provide the archival community with detailed format specifications.


The E-ARK project has defined the following format specifications:

For OAIS information packages

  • Common Specification for Information Packages
  • SIP Specification
  • AIP Specification
  • DIP Specification

For content types (to store data of specific types within the information package)

  • SIARD 2.0 format for databases
  • ERMS format for electronic records from records management systems
  • Geodata format to store geographic information along with other data or content types


Every tool developed or modified in the scope of the E-ARK project is compatible with all the above format specifications.


The E-ARK Web solution was developed as a reference implementation. Although it is not a mature tool set (it is currently under further development), all components were well tested and tried in cooperation with the specifications and other tools in some of the more than twenty real-world E-ARK pilot scenarios.


You can find some basic description as well as links to more detailed information of every component at the Library page of the General Model (http://kc.dlmforum.eu/gm3).

The General Model provides information about all E-ARK components from different aspects. The cross-reference view shows the connected elements of a selected component. The components are divided into four groups: format specifications, use cases and processes, tools and pilot scenarios.

    The above products portfolio of the E-ARK project is considered as an initial release of the eArchiving Building Block services. (Please note that the General Model is being redesigned according to the service oriented approach of the eArchiving Building Block.)



Finding solutions to your digital archiving problems

Finding an eArchiving solution corresponding to your requirements means mapping your problem to the eArchiving format specifications and Sample Software Portfolio tools. That is to find the right specifications and tools best matching your demands.

Finding the right eArchiving components is not always easy. eArchiving follows a modular approach. You can find more than one, sometimes overlapping, solutions to one particular digital archiving task, usually with tools from different vendors. In order to help you find your way in the Sample Software Portfolio we would recommend consulting the General Model. With its versatile views the General Model helps you finding information you’d need to select the appropriate components.

    

Probably the most informative section of the General Model is the Map view. It shows all E-ARK elements organized according to the OAIS processes. The Map view has four subviews:

  • format specifications,
  • use cases and processes,
  • tools,
  • and pilots.

The format specification subview highlights the format specifications (along with the source and output content formats) in white.

To the left you can find the source formats corresponding to the pre-ingest use cases. Then each input/output element (content types, SIP, AIP, DIP) corresponds to an E-ARK format specification. At the rightmost part you can find the output file formats after a successful access process.


We would recommend using one of the eArchiving format specifications if you can. If you decide to use your local formats it is not guaranteed that the eArchiving tools can process them.


The Map view also presents the tools processing the input formats into the output formats.   

The tools subview shows the name of the tools in orange written on the arrow pointing from the input to the output. For example RODA-In creates an E-ARK SIP from any of the content types to the left.

As you can see there can be more than one tool for the same purpose. (E.g. creating a SIP can be performed by 4 different tools.) We would recommend experimenting a little with the tool candidates to find out which one suits your requirements, infrastructure and archival environment the best. You can find more information about how other institutions have used the selected tools and specifications in the E-ARK pilot documentation (explained below). Although theoretically all modules of the Sample Software Portfolio are compatible with each other, using tools from the same vendor is usually a safer solution. They are constantly tested with each other in many digital archiving environments and scenarios.  



The process subview shows the high-level process diagrams of the selected OAIS process.

As with the information packages, the OAIS model only names the required processes but doesn’t define the internal structure or give any detail. The E-ARK project defined the processes at an appropriate detail level in order to design the tools.


The pilots subview summarizes the pilot scenarios executed by the project.  

In order to test the tools and specifications of the E-ARK use cases, the project has carefully planned and executed a set of more than twenty real-world pilot scenarios at archival institutions in seven European countries. The view presents the scenarios of each pilot site showing the tools and specifications they have tested along the process map from pre-ingest to access.


As the E-ARK project was focusing on testing the cooperation and interactions of the different components, these pilot scenarios can be very useful when planning your own digital archival scenarios. If you can find pilot scenarios resembling your own, the pilot documentation will help you implementing your own solution. A pilot scenario can be helpful if it implements the same use case (like archiving databases, or geodata along with your content), or uses the same tools you are planning to try out.


You can find detailed information about the pilots in the D2.3 Detailed Pilot Specification and D2.4 Pilot Documentation (and here) documents.