Anchoring of datasets
Each dataset update triggers the creation of a hash (‘fingerprint’) of the dataset. Every day, Eurostat injects the hashes of each stable disseminated dataset version in specific formats into the EBSI blockchain.
By ‘anchoring’ a dataset, Eurostat ensures that there is a permanent trace of it in the form of its hash, even when it is subsequently overwritten by other versions.
Anchored variants of Eurostat datasets
The number of ways in which subsets can be taken by filtering of categories or selection of variables from a given Eurostat dataset is extremely large, and it would not be practically feasible to anchor every such subset.
Therefore, Eurostat only anchors full datasets. Moreover, Eurostat only anchors specific formats with certain options set, as per this overview:
| Format | Details | Labelling | Attributes | Measures | Return last update | Include non-available data |
|---|---|---|---|---|---|---|
| TSV | Full | - | - | - | - | False |
| SDMX-CSV (1.0) | Full | Codes | - | - | - | False |
| SDMX-CSV (2.0) | - | Codes | All | All | False | - |
| SDMX-ML generic data (2.1) | Full | - | - | - | - | - |
| SDMX-ML data (3.0) | - | - | - | All | - | - |
| JSON (EN) | - | - | - | - | - | - |
Analysts and researchers wishing to benefit from the anchoring of Eurostat data therefore have to download the full dataset
- according to one of the eligible formats
- compressed using the gz method
- with options set as above.
Metadata for Eurostat datasets anchored in EBSI
Following a successful EBSI search, the metadata of the dataset that has been anchored are returned.
For Eurostat, the following metadata are returned:
| Attribute | Key | Description |
|---|---|---|
| creator | The Eurostat identifier, proving that the file has been anchored by Eurostat and thus corresponds to a genuine Eurostat dataset. | |
| events | This standard EBSI feature is currently not being used in the Eurostat blocks. | |
| metadata | Multi-component Eurostat-defined key-value pairs: | |
| metadata | format | Format of the file that has been anchored. |
| metadata | compression | String indicating the file compression method, currently Eurostat only uses the gz compression method. |
| metadata | timestamp | This is the time when the dataset (as hashed) was disseminated, whereas the standard ‘timestamp’ attribute corresponds to the time when the dataset was anchored in EBSI. |
| metadata | DOI | The digital object identifier (DOI) of the dataset: This includes 2 components: the Eurostat prefix (10.2908) and the unique dataset identifier code. |
| metadata | hashAlgorithm | Algorithm used to hash the file. Eurostat uses the SHA-256 method for anchoring its disseminated dataset files. |
| metadata | dataset-status | For the pilot, the string ‘Unofficial pilot release’ is used to indicate that the anchoring of Eurostat data in EBSI is being piloted. |
| timestamp | Multi-component standard EBSI feature: | |
| timestamp | datetime | Time at which the block was injected into EBSI. |
| timestamp | proof | Proof of the source. This can be either a block number or a hash of timestamp certificate. |
| timestamp | source | Defines how the datetime was resolved. This can be either ‘block’ or ‘external’. |
DOIs of Eurostat datasets
The digital object identifier (DOI) of each dataset disseminated by Eurostat contains 2 components:
- the Eurostat prefix (10.2908)
- the unique dataset identifier code.
As an example, for the Eurostat quarterly GDP dataset (NAMQ_10_GDP), regardless of version and format, the DOI is 10.2908/NAMQ_10_GDP.
To access the data behind a DOI, a DOI resolution service can be used. For instance, one can use the service by doi.org by concatenating https://doi.org/ and the DOI. In the above example, this yields https://doi.org/10.2908/NAMQ_10_GDP.
This will navigate to the version of the data that is currently being disseminated by Eurostat. It cannot be used to access previous versions of the dataset once they have been overwritten by new data.
How to know whether a hash has been created by Eurostat
While the group of organisations authorised to write to the EBSI blockchain is limited, it is still theoretically possible that an EBSI block purporting to represent the hash of Eurostat data has been created by another organisation.
It is easy to check that an EBSI block does indeed correspond to a Eurostat dataset by consulting the ‘creator’ entry of the metadata in the block to ensure that the decentralised identifier (DID) of the creator is identical to the DID of Eurostat, meaning ztChwUsg8k9RNj8JUiDLxMs.
In the example of a successful search, the returned result starts with ”creator”:”did:ebsi:ztChwUsg8k9RNj8JUiDLxMs” and it can thus be concluded that the has corresponds to a Eurostat dataset.
Time lag before anchoring
It is important to allow a certain time for each dataset to settle before it is anchored, as it might otherwise contain major, immediately detectable errors – and it would not be in the interest of an analyst to base their analysis of such data.
To allow the anchoring to take place on reasonably stable data, only data that have remained stable for 23 hours since they were disseminated are anchored.