A few useful points - SDMX Web Services
How to build a REST query to retrieve Eurostat data
This section explains how to build a REST query to retrieve data.
In case you would like to read more about the REST query building, please consult the REST SDMX 2.1 page of this online help or check the following document and in particular its section 4.4 : SDMX GUIDELINES FOR THE USE OF WEB SERVICES
It should be mentioned that all principles about building a query apply to SOAP requests as well, with the main difference that filters must be specified in an appropriately formatted XML message request for relevant operations (GetGenericData, GetStructureSpecificData).
What is a REST query?
A REST query is based on following URL scheme:
resource is the desired resource or artifact (in this case we want to retrieve data, so it would be "data"),
flowRef is the reference to the dataflow (e.g. nama_10_gdp), and
key is the set of filters to be applied, plus
?[startPeriod=yyyy[mmdd]&]endPeriod=yyyy[mmdd] for any optional additional time filtering. The Date format could contain only the year or the year and the month or, the detailed information, year, month and day.
This type of request refers to a dataflow, and gives a key to indicate desired filters.
The response is provided by default in SDMX-ML 2.1 generic schema. Modify the HTTP header field "Accept" with application/vnd.sdmx.structurespecificdata+xml to receive a response in SDMX-ML 2.1 structure specific schema. The structure specific schema is more suited for the processing of large amounts of data.
More information about SDMX-ML formats is available in document SDMX GUIDELINES FOR THE USE OF WEB SERVICES
The steps to build a query
Identify your dataflow name.
By checking the navigation tree on Eurostat portal homepage (/eurostat/data/database)
By retrieving the list of dataflows through our web services (eg REST: https://ec.europa.eu/eurostat/SDMX/diss-web/rest/dataflow/ESTAT/all/latest) and finding the required dataflow.
By already knowing a Eurostat dataset code, which is the identifier for dataflows as well.
- Check the dimensions and their elements.
By retrieving the Data Structure Definition (DSD), e.g. the REST request for the DSD about nama_10_gdp is https://ec.europa.eu/eurostat/SDMX/diss-web/rest/datastructure/ESTAT/DSD_nama_10_gdp
By checking the Data Explorer for the relevant data, e.g. for nama_10_gdp. (https://ec.europa.eu/eurostat/product?mode=view&code=nama_10_gdp&language=en).
Please, keep in mind that you would still need to check the DSD for the order of dimensions.
- Create a key by adding filters for each dimension (except [TIME]).
Check the DataStructure/DataStructureComponents/DimensionList (SDMX2.1) element and take note of the order of the dimensions.
i. In the case of nama_10_gdp, the order of dimensions is:
- FREQ (frequency),
- UNIT (unit),
- NA_ITEM (indicator name),
- GEO (geographical dimension).
ii. For each of those dimensions, we can filter the elements based on the following rules:
- Key = [FREQ].[UNIT].[NA_ITEM].[GEO]
- "+" can be used as an OR for each of the dimensions (example: [GEO] = EU28+DE+FR+IT : retrieve data only for EU28, Germany, France and Italy)
- "." is used as a separator for the dimensions
- If you want to wildcard (choose all elements) a dimension, just leave an empty space.If the first dimension is wildcarded, just put the separator ".", and you can do the same with the last dimension. For instance, a key of "…" would stand for retrieval of the full dataset: [FREQ].[UNIT].[NA_ITEM].[GEO]
- The Frequency ([FREQ], mostly first dimension) should only be given, if the dataset contains a mix of several periods (eg. annual and monthly data).
Keep in mind: the order of the dimensions is crucial. You need to keep the order as it is stated in the DimensionList in the DSD.
Now check the Code list for each dimension by
i. Looking at the DSD (Structure/Codelists/Codelist (id="CL_[DimensionName]") (SDMX2.1)). All Codes can be used as elements.
ii. Checking the dimension values in Data Explorer.
- Add [TIME] filtering, if relevant.
If you want to filter the time dimension, you can do this by adding parameters at the end of the link after a "?" sign, using "startPeriod" and / or "endPeriod" e.g. in case you would like to filter data starting from 2006 until 2009 included the link would finish with "?startPeriod=2006&endPeriod=2009"
https://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/nama_10_gdp/key where you replace key with your creation from above ([FREQ].[UNIT].[NA_ITEM].[GEO]) plus any [TIME] filtering.
Examples with the "nama_10_gdp" dataflow (GDP and main components (output, expenditure and income))
These are REST calls delivering responses in SDMX-ML 2.1 with generic schema. Modify the HTTP header field "Accept" with application/vnd.sdmx.structurespecificdata+xml to get responses with structure specific schema.
- Retrieving data expressed in unit "CP_MEUR" (Current prices, million euro) for indicator "B1GQ" (Gross domestic product at market prices) and GEO dimension "DE" (Germany), "FR" (France) and "IT" (Italy), limited to the years from 2010 to 2013:
- The same as above, yet only with data since 2010:
- The same as above, yet only with data for indicators "B1GQ" (Gross domestic product at market prices) and "P3" (Final consumption expenditure):
- The same as above, yet only for data until 2012 included:
- The same as above, yet without any filtering on geo, i.e. fetching all available GEO dimensions:
- The same as above, yet without any time filtering i.e. fetching the whole time series:
- The same as above, yet without any unit filtering, i.e. fetching all available units:
Asynchronous file delivery process
The Eurostat SDMX Web Services have by design a built-in asynchronous file delivery process for responses above a certain size limit, because queries about large amounts of data (containing more than 30 thousands of cells) may require processing time before delivery.
In case a query is issued that can't be served immediately, the XML response that is delivered synchronously contains an error message (SOAP: SOAP fault, REST: http 413) which includes a URL pointing to the placeholder where the file with the data response is delivered as soon as the request is completely processed.
Before issuing a request, you can estimate the size of the expected response, as detailed in this online help.
Example of asynchronous error message with SOAP
<S:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"> <S:Body>
Example of asynchronous error message with REST
|<?xml version='1.0' encoding='utf-8'?>
<footer:Message code="413" severity="Infomation">
Due to the large query, the response will be written
to file which will be located under URL:
<common:Text xml:lang="en">Please check the location periodic every 5 minutes or at your preference.</common:Text>
Depending on the processing time, the client system consuming the response may have to check periodically for the availability of the file with the data response. As long as the file with the data response is not available, any request to the placeholder URL results in a response "Not Found (http 404)". The response file will be available for one hour at least after successful processing, but most probably longer. This is depending on the number of concurrent users which are using the service.
In case an issue is encountered by the Eurostat SDMX Web Services, the ultimately delivered response file would consequently contain an error message instead of the data response.
Size limitations to data extraction
Some datasets are huge in terms of contained data. Delivering them may take time and the Eurostat SDMX Web Services can serve simultaneously only a limited number of such datasets.
The SDMX Web Services contain by design a limit to the size of data that can be queried. Currently, the maximum size limit is set to 1 million of cells within a single response.
If the response contains more than 30 thousands of cells, then the data will be delivered through the Asynchronous file delivery process.
You can estimate the size or your response by multiplying all your selected elements per dimension.
Identify first the number of dimensions in the query, and then check how many elements per dimension are selected.
E.g. for data held in "nama_10_gdp" (GDP and main components (output, expenditure and income)):
With all five dimensions,
- [FREQ] (frequency)
- [UNIT] (unit)
- [NA_ITEM] (indicator name)
- [GEO] (geographical dimension)
and a count of elements in each dimension,
- [FREQ] = 1
- [UNIT] = 22 elements
- [GEO] = 44 elements
- [NA_ITEM] = 39 elements
- [TIME] = 43 elements
requesting the full dataset would result in a total of 1 x 22 x 44 x 39 x 43 = 1.623.336 cells.
As it exceed the limit of 1 million cells, the response to a query for the complete nama_10_gdp dataset won't be delivered.
File naming convention for REST served responses
This section describes the file naming convention for files delivered by the Eurostat SDMX Web Services following in response to REST requests.
The name of a response file always contains the parameters that were used in the query (or some placeholders like ALL or LATEST).
Please note that this naming convention is not applicable to files with a data response provided in a URL placeholder through the asynchronous file delivery process.
The following list shows some examples of the naming convention with some explanations:
|all_latest_ESTAT.xml||dataflows||The latest versions of all data flows for ESTAT agency.|
|all_latest_ESTAT_references=none_detail=full.xml||dataflows||The same as above, but without references with full detail.|
|DSD_prc_hicp_midx_latest_ESTAT_references=none_detail=full.xml||datastructure||Latest version of prc_hicp_midx datastructure without references, but with full detail.|
|cdh_e_fos_..PC.FOS1.BE_ALL.xml||data||Data for the dataflow cdh_e_fos with no filtering for [FREQ] and [Y_GRAD], PC set for [UNIT], FOS1 for the [FOS07] dimension and only for Belgium|
|cdh_e_fos_ALL_ALL.xml||data||Full dataset for cdh_e_fos.|
|ef_alvege_.HA0...._ALL.xml||data||Dataflow ef_alvege with only filtering on [AGRAREA] dimension.|
|lfsi_emp_a_.T.EMP_LFS.EU27_ALL.xml||data||Dataflow lfsi_emp_a with filtering on all dimensions.|
|nama_10_gdp_A.CP_MEUR.B1GQ.DE_ALL_startPeriod=2000.xml||data||Namq_10_gdp dataflow with filtering on all dimensions and only data starting from 2000.
The ALL at the end of data file names stand for the agency, which could be ESTAT or ALL depending on the request.