About this workshop
A THREE-DAY WORKSHOP
Preserve your digital information to reuse it, while keeping it interoperable and trustworthy!
The CEF eArchiving Building Block is holding a three-day workshop at end of January 2021.
Register for three days of presentations, interviews, use cases and panel discussions to gain practical knowledge on how
eArchiving is helping data producers, archives and solution providers tackle the issues of long-term accessibility of information.
Join us on Monday 25th at 11:00 CET for a workshop dedicated to data producers! Follow our demonstration on data preservation toolkit, learn from others by listening to the story of the Norwegian Health Archives, participate in the panel discussion and more.
Participants should submit questions and suggested discussion points beforehand using the comments section below or by sending an email to CEF-BUILDING-BLOCKS@ec.europa.eu.
The CEF eArchiving Building Block provides specifications, reference software, training and service desk support for digital archiving, including digital preservation. In order to help users benefit from the features of the Building Block, CEF is running a series of training Webinars that will cover its core functionality and other relevant CEF eArchiving content.
Agenda - Session #1: Data Producers
11:30 - 11:45
Database Preservation Toolkit
Luís Faria (KEEP Solutions, Portugal)
11:45 - 12:05
Use Case - Norwegian Health Archives
Stephen Mackey (PIQL);
Hanne Mari Hindklev (Norwegian Health Archives)
12:05 - 12:15
12:15 - 12:30
12:30 - 13:00
Panel Discussion: "The European directive on open data and FAIR principles: Impact on long-term preservation of government and research data".
Moderator: Carlota Bustelo (Gabinete Umbus SL, Spain)
José Borbinha (INESC-ID, Lisbon University);
Joy Davidson (Digital Curation Centre and University of Glasgow);
Igor Kuzma (Statistical Office of the Republic of Slovenia);
Andreas Rauber (Technical University, Vienna);
Daniele Rizzi (European Commission - Unit G1: Data Policy and Innovation)
13:00 - 13:15
Ask us a question!
If you have any additional comments or questions on the webinar, or generally concerning CEF eArchiving or the Service Offering, please reach out to us via Service Desk.
You will need to be logged in using an EU Login account to submit a request. Don't have an EU Login account yet? Sign up here.
Written responses from the interactive Q&A
I found here requirements for information packages for https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/Technical+Specifications but I would like to know where I can find the validation requirements for eArchiving-compliant software?
The source documents for validation requirements are the E-ARK specifications. These provide:
· A narrative overview of E-ARK package structure and metadata
· The detailed validation requirements for structure and metadata
· XPaths of selected elements and examples
Is it possible to share the presentation files with us?
The presentation files will appear here: https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/eArchiving+webinar+Series+2020?pk_campaign=XSELL-Bulletin55-202008&pk_source=email&pk_medium=CEFbulletin&pk_content=event
I was not present at the beginning of this webinar, I heard about a video channel, can you please give us some information about it?
Here is the webinar link: https://ec.europa.eu/cefdigital/wiki/x/kgn0D
Regarding DPTK. Will there be support for templates for database metadata edits, to reuse documentation?
At the moment we do not have that feature in the roadmap as none of our sponsoring institutions have pushed forward that development.
How do you approach the problem that organisations have more and more systems to register documents and information, sometimes even those who work with information in an organisation aware what information exists and in what system?
That’s not a technological problem, it’s an information management problem. Denmark, for example, has a good protocol in place to solve that issue. All public institutions are mandated to communicate with the Danish National Archives before acquiring any information management system. Part of that interaction is to provide detailed information about the system database and its data dictionary so that in a 5 years’ timespan they are able to harvest and archive that database while at the same time have sufficient knowledge about its structure and contents.
Failing to comply with this protocol means that the acquisition shall be considered illegal by the public agency.
When eHealth anonymises the data, does it mean that you cannot go back to the original ones after a certain number of years?
The eHealth specification does not recommend anonymisation of the data, in fact the patient personal information and the patient centricity of the structure are fundamental to the use cases of delivery of individual records to next of kin or of cohorts of data to researchers.
How is appraisal of medical records provided? By archive or by producer eg. sampling etc.? Have you considered two variants of SIP - 1) just metadata for appraisal, 2) metadata and objects for transfer? DICOMs from CT/MRI can be in tens of GB.
The specification and the Norwegian case anticipate that patient records are added to the archive when they are complete; that is when a patient is deceased. Appraisal then takes place at the producer when they either know that a patient has died, or that a period of time has passed such that the patient cannot be alive. The Norwegian archive does not hold images or video (from radiography or pathology) due to the storage implications, but these could be included. If they were then there are implications not just on storage, but on transmission and preservation which would have to be considered.
Regarding DBPTK: If the originating agency does not have the original database documentation, how can one know how the tables are connected and make the data transformation?
If the database has foreign keys defined, DBPTK can automatically build a diagram with all the relationships between the tables, and this might give you an idea how the internal organization is ararnged. If not, there must be a manual analysis of the data to understand how the tables are related to each other. Alternatively, analysing the application or applications that use the original database and try to understand its internal organization. In the case that no foreign keys are defined, the data transformation feature is not possible to be done.
A good strategy is always to include a data dictionary in the Submission Information Package before accepting a new database into the archive.
When a data transformation is defined in DBPTK I suppose the anonymized/simplified "data mart" it created adds up to the data preserved and the original data is kept as well and access to this original data can be limited. Right?
These operations are all done at the index level, the SIARD file remains unchanged. Furthermore, data transformation can be reverted to the original state.
In order to anonymize/simplify some tables, for instance, remove one column from being consulted or exported, DBPTK has the option to hide/show columns of the table that are independent from the data transformation feature. Limitations to the data itself is not yet implemented on DBPTK. If a user has access to the DBPTK it will see all databases that are indexed in the system.
How come you chose to specify a new standard for eHealth (CITS) instead of using SIARD as the preservation format? Are these data not originally produced in databases?
The major part of the information content of patient medical records is still in plain narrative, i.e. documents which hold a lot of detailed information which is desirable for the described use cases. Database medical record systems hold metadata and summary data in addition to these documentary records plus other files such as images, videos etc. For the purposes of extracting either single complete patient records, or cohorts of records for research from an archive, a granular structure of one patient per archival package is preferable to the SIARD approach of a single archival package for an entire database or database extract. Other health use cases such as that being examined by eHeath2 (cancer registries) are conversely better served by the SIARD model.
What would be an example of a research question on public sector information? A catalogue of these questions would help to define which public sector information should be preserved, wouldn't it?
A catalogue of these questions could be part of a methodology for appraising the public sector research information. Experience on appraisal of other kinds of information as has been done in Archives or in records management programs is an important basis to build this methodology, in consensus with the Research Communities. Probably, questions for specified groups or domains of research will be needed, but a starting point could be the common minimum questions about the potential reusability of data, their uniqueness, their nature as evidence of research conclusions, etc. Some findings are achieved through different projects, but what is needed is a strong leader to consolidate and spread this methodology, based on collaboration between archival specialists and the research community.
About CEF eArchiving
Financed by the Connecting Europe Facility (CEF), the purpose of the CEF eArchiving Building Block is to promote the uptake and accelerate the use of eArchiving specifications amongst both public and private entities established in the EU. The benefits to both users and the wider economy of adopting eArchiving include:
- Flexibility: supports scaling of digital archival systems from small to very large
- Standardisation: enables information assets to be transmitted, preserved and re-used across borders as well as time
- Efficiency: accelerates the delivery time of a working digital archive, while controlling costs
- Transparency: ensures a high level of confidence among all participants in the information value chain
- Risk management: reduces risks in information assurance
To do so, CEF eArchiving makes the following services available:
About the CEF building blocks
The CEF building blocks provide basic services which can be reused to enable more complex digital public services offered to citizens, businesses and public administration. They provide reusable tools and services helping to underpin the Digital Single Market, that aims to remove digital regulatory barriers, contributing as much as EUR 415 billion per year to the European economy. The CEF Digital Portal is the home of the CEF building blocks (Big Data Test Infrastructure, Blockchain, Context Broker, eArchiving, eID, eDelivery, eInvoicing, eSignature, eTranslation and Once Only Principle). It is the one-stop-shop for information about the building blocks.