About this workshop
A THREE-DAY WORKSHOP
Preserve your digital information to reuse it, while keeping it interoperable and trustworthy!
The CEF eArchiving Building Block is holding a three-day workshop at end of January 2021.
Register for three days of presentations, interviews, use cases and panel discussions to gain practical knowledge on how
eArchiving is helping data producers, archives and solution providers tackle the issues of long-term accessibility of information.
Join us on Thursday 28th at 10:00 CET for a workshop dedicated to solution providers! Follow the demonstration, discover the innovative ideas of 2 finalists & the winner of the Digital Innovation Challenge and how they plan on reusing eArchiving then participate in the panel discussion and more.
Participants should submit questions and suggested discussion points beforehand using the comments section below or by sending an email to CEF-BUILDING-BLOCKS@ec.europa.eu.
The CEF eArchiving Building Block provides specifications, reference software, training and service desk support for digital archiving, including digital preservation. In order to help users benefit from the features of the Building Block, CEF is running a series of training Webinars that will cover its core functionality and other relevant CEF eArchiving content.
Agenda - Session #3: Solution Providers
10:00 - 10:15
Fulgencio Sanmartín, European Commission DG CNECT G2
Kuldar Aas, Estonian National Archives
10:20 - 10:35
Carl Wilson (OPF)
10:35 - 10:45
10:45 - 11:05
Use case: "eArchiving use in start-ups - The Digital Innovation Challenge finalists"
Moderator: Janet Anderson, Danish National Archives
11:05 - 11:20
11:20 - 11:50
Panel discussion: "The future of Digital Archiving across Europe"
Moderator: Natasa Milic-Frayling (Intact Digital Ltd)
Teo Redondo (Libnova);
Jurry de la Mar (T-Systems);
Matthew Addis (Arkivum);
Miguel Ferreira (KEEP Solutions)
11:50 - 12:00
Ask us a question!
If you have any additional comments or questions on the webinar, or generally concerning CEF eArchiving or the Service Offering, please reach out to us via Service Desk.
You will need to be logged in using an EU Login account to submit a request. Don't have an EU Login account yet? Sign up here.
Written responses from the interactive Q&A
Could you please provide the IP-validation web-address?
E-ARK Information Package validation is available at following locations:
* EC Interoperability Testbed: https://www.itb.ec.europa.eu/itb/. This is the official EC maintained instance which is used for official eArchiving compliance testing. Note that registration is needed in order to use the testbed.
* E-ARK validation demonstrator: https://pyip.openpreservation.org/. This is a public validator instance which is used for demonstration purposes. Note that this demonstrator does not store the uploaded information packages and does not have the resources to support larger packages.
* A REST validator is also available online as a Docker container: https://hub.docker.com/repository/docker/eark4all/py-ip-validato
Dear Jurry, I do believe the industry sector has established digital archiving for many years, often before the public. Are you however sure that the enterprise and profit sector better adopt digital archiving standards like the E-ARK specification than the public sector? I have doubts.
Jurry de la Mar:
Here is a link with reference information of past 25 years experience in industry:
For science data we believe the new innovative approach to provide digital archiving as part of a full lifecycle data management on cloud platforms is the best approach under development in the Archiver project. Anybody interested in our European and OpenStack-based public cloud services please have a look at https://open-telekom-cloud.com/en
I would be happy to share with you more detail how industry e.g. Airbus go about preservation of their data, that is also subject to regulatory constraints. Please contact me at email@example.com
Arkivum works with many organisations in industry on archiving and preservation, especially in life sciences, for example drug trial archiving. Each industry tends to have its own standards and regulations that incorporate the need for archiving, e.g. as part of GxP for pharma, LOTAR for aerospace, MiFID II for financial services. Some build upon or reference OAIS or TDR. But very few industry sectors are aware of E-ARK. Therefore, there is a job of work to be done by the CEFDIGITAL eArchiving building block to
promote package standards and related work to industry and encourage adoption outside of the public sector.
I agree with the provocation. I don’t think the private sector will endure eArchiving standards such as E-ARK specifications unless forced to by law or any other sort of mandate.
Talk of scalability – how does your organization’s solutions handle the ever-growing amount of data? For instance, what is the biggest amount of data you have tested in terms of ingest (TB or PB)? One of you mentioned that there exists numerous of solutions to “small-medium enterprises”, but what about the large enterprises? When will a COTS-product be able to handle huge PB-scale archiving? – or even IP sizes on 50TB or so.
Jurry de la Mar:
We manage more than 500 PB on our public cloud service, and that has been collected only over the last 4 years ....it is the fastest growing but also the most cost-efficient operation we run in the company today.
As for PB scale archiving, in ARCHIVER Phase2 we are targeting 100TB per day ingest rates and PB volumes - this is just one step along the way to targeting the 100PB+ archiving requirements that several of the ARCHIVER end users have. We’ve already archived PB datasets for some of our customers with file counts in the billions.
We manage directly (managed services) or indirectly (on-prem deployments) several tenths PBs, from small deployments (~10 TB) to large scale (+5 PB and growing daily) (per individual customer), so we have a varied spectrum of situations.
Our solution, RODA, has been designed from the ground up to be horizontally scalable, so in theory there is no limit for how much data you can feed to the system.
However, one must be aware that scalability is not only measured in petabytes. Each project may encompass a different kind of scalability. For example, we have projects where the variety of formats is the focus of the scalability (heterogeneity). Others is the ability to scale the ingest process to the millions of records per day. Others is the ability to support thousands of consumers accessing the repository at the same time and ensuring a great experience for all of them.
So, in the end… scalability is not only about petabytes. That’s just part of the problem, which typically is solved by scalable storage solutions such as the ones we can find on the cloud.
Also, please be aware that for most systems, it is harder to cope with lots of small files than with a small number of large files.
Each project has its own set of requirements, that’s why an off-the-shelf solution does not always meet the requirements of a particular project.
I think it was argued yesterday that from a pragmatic point of view archival system developers are still interested in some level of vendor lock-in, even if they publicly argue otherwise :) Can you convince us that this is indeed not the mindset in your organisation?
For lack of vendor lock-in, Arkivum for example includes data escrow of AIPs and other measures that are baked in so that customers can migrate from us with no lock-in. Having a strong exit capability can actually be a competitive advantage in the market esp. because users hate lock-in (and rightly so). The key to avoiding lock-in is the use of open standards and open specifications, and to support exit plans that allow customers to know that they are protected against a range of disaster and emergency migration scenarios. Most important is making it easy for customers to test these strategies in advance so they know they are working in practice. This gives customers a lot of confidence and showing lack-of lock in can be a big selling point.
Jurry de la Mar:
We have committed to make the complete toolset that we develop in the Archiver-Project available on GitHub. Every other service provider organisation could use it as DIY if that is feasible. We will provide it as turn-key service on private or public clouds.
Similar to what Matthew mentions, clear exit strategy (including data escrow) is compulsory in every tender response, so this provides the eventual customer a real no lock-in, especially when we are talking about 4- or 5-year contracts.
Vendor lock-in is one of the worst risks you may have to deal with as a digital curator or data owner. Solutions will eventually become obsolete and data must outlive those systems. Vendor and solution freedom is something that we all should aim for. The eArchiving specifications are a safeguard in these type of scenarios as they enable (or greatly simplify) the transference of data between one digital preservation system and a succeeding one.
Which ISO standard was Matthew talking about as being a bit too high up for community acceptance?
ISO16363 - Trusted Digital Repositories is the one I mentioned. It can be quite heavyweight, e.g. compared to CoreTrustSeal or simple self-assessment frameworks such as DPC RAM.
Final point on 16363 is that formal certification through audits, e.g. by PTAB, is the hard bit - using the spec for self-assessment is much easier and can be quite useful.
I've heard there is only one organisation globally that has managed to pass ISO16363 certification?
As far as I know, the European Publications Office is certified ISO16363. Fulgencio may be able to confirm this.
ISO 16363 is a checklist of OAIS. If you have 16363 is definitely a sign that you are correctly implementing OAIS, but OAIS conformance is kind of generic, because it was written with very generic, non-technical nor specific terms (and OAIS was written that way on purpose, or so some of the authors told me).
Jurry de la Mar:
Fully agree, good standards should be defined this way and success in industry with most ISO standards proves it.
Regarding ISO 16363 there is a big difference between people who use it internally as a guide or checklist and those who have been formally audited and certified as conformant. Formal audit and certification can only be done by accredited organisations who follow ISO 16919 (the standard for auditors), for example PTAB. The list of organisations that have been audit and certified to ISO16363 is here: http://www.iso16363.org/iso-certification/certified-clients/ There’s only two on the list and one of those has lost certification and needs to be re-certified.
Is Bagit still rarely adopted in Europe?
Yes. As far as I know, it’s rarely used in Europe, especially by the public sector, but I don’t know the entire European market.
We see Bagit used quite often - mostly for transfer of content so it can be validated as correctly received at the bit level, e.g. into a DPS. That includes Europe.
Jurry de la Mar:
We see a strong increase of interest in Bagit and our Archiver solution supports it, because it can enable more flexibility in managing millions of files and petabytes of data.
Bagit is simply a way of packaging information (in a zip file) bundled with extra metadata. An E-ARK IP could be bundled as a Bag with the IP in the data sub-folder. The issue is that the bag metadata lives above the data directory and can’t be incorporated into the bag. Any validation could only be done on the contents of the data folder which would have to be the complete IP. The bag structure does bring some useful features, particularly the ability to say something about the IP outside of the IP structure, something the current implementation lacks.
About CEF eArchiving
Financed by the Connecting Europe Facility (CEF), the purpose of the CEF eArchiving Building Block is to promote the uptake and accelerate the use of eArchiving specifications amongst both public and private entities established in the EU. The benefits to both users and the wider economy of adopting eArchiving include:
- Flexibility: supports scaling of digital archival systems from small to very large
- Standardisation: enables information assets to be transmitted, preserved and re-used across borders as well as time
- Efficiency: accelerates the delivery time of a working digital archive, while controlling costs
- Transparency: ensures a high level of confidence among all participants in the information value chain
- Risk management: reduces risks in information assurance
To do so, CEF eArchiving makes the following services available:
About the CEF building blocks
The CEF building blocks provide basic services which can be reused to enable more complex digital public services offered to citizens, businesses and public administration. They provide reusable tools and services helping to underpin the Digital Single Market, that aims to remove digital regulatory barriers, contributing as much as EUR 415 billion per year to the European economy. The CEF Digital Portal is the home of the CEF building blocks (Big Data Test Infrastructure, Blockchain, Context Broker, eArchiving, eID, eDelivery, eInvoicing, eSignature, eTranslation and Once Only Principle). It is the one-stop-shop for information about the building blocks.