API - Detailed guidelines - SDMX3.0 API - data query
Overview
Data queries allow retrieving statistical data. Entire datasets, individual observations, or anything in between, can be retrieved using filters on dimensions (including time).
The data retrieved can be retrieved in a variety of formats (JSON, XML, CSV, etc.).
Depending on the request, a data query can result in a (potentially very) large response in which case data is delivered asynchronously. For more information please read the page API - Detailed guidelines - Asynchronous API
It is important to remember that only the last version of each statistical observation is made available in the system. When a statistical observation is being updated, the previous value of the observation is lost and cannot be returned.
Query URL syntax
Generic SDMX3.0 syntax is the following
protocol://ws-entry-point/data/{context}/{agencyID}/{resourceID}/{version}/{key}?{c}&{firstNObservations}&{lastNObservations}&{attributes}&{measures}&{returnData}
Parameter | Type | Description | Default | Multiple values? |
---|---|---|---|---|
context | Must be set to dataflow |
Data can be reported against a data structure, a dataflow or a provision agreement. This parameter allows selecting the desired context for data retrieval. | * | No |
agencyID | A string compliant with the SDMX common:NCNameIDType | The agency maintaining the artefact for which data have been reported. | * | No |
resourceID | A string compliant with the SDMX common:IDType | The id of the artefact for which data have been reported. | * | No |
version | A string compliant with the allowed SDMX versioning schemes | The version of the artefact for which data have been reported. | * | No |
key | A string compliant with the KeyType defined in the SDMX Open API specification. | The combination of dimension values identifying the slice of the cube for which data should be returned. Wildcarding is supported via the * operator. For example, if the following key identifies the bilateral exchange rates for the daily US dollar exchange rate against the euro, D.USD.EUR.SP00.A, then the following key can be used to retrieve the data for all currencies against the euro: D.* .EUR.SP00.A. Any dimension value omitted at the end of the Key is assumed as equivalent to a wildcard, e.g. D.USD is equivalent to D.USD.* .* .* |
* | No |
c | Map |
Filter data by component value. For example, if a structure defines a frequency dimension (FREQ) and the code A (Annual) is an allowed value for that dimension, the following can be used to retrieve annual data: Multiple values are supported, using a comma ( The plus ( Operators may be used too (see table with operators below). This parameter can be used in addition, or instead of, the |
Yes | |
firstNObservations | Positive integer | The maximum number of observations to be returned for each of the matching series, starting from the first observation | No | |
lastNObservations | Positive integer | The maximum number of observations to be returned for each of the matching series, counting back from the most recent observation | No | |
attributes | String | This parameter specifies the attributes to be returned. Possible options are: dsd (all the attributes defined in the data structure definition), msd (all the reference metadata attributes), dataset (all the attributes attached to the dataset-level), series (all the attributes attached to the series-level), obs (all the attributes attached to the observation-level), all (all attributes), none (no attributes), {attribute_id} : The ID of one or more attributes the caller is interested in. |
dsd |
No |
measures | String | This parameter specifies the measures to be returned. Possible options are: all (all measures), none (no measure), {measure_id} : The ID of one or more measures the caller is interested in. |
all |
No |
returnData | Only available for the TSV and SDMX-CSV formats. Supported values are: ALL All time-series are returned in the output, including the ones having no data / flag. DATA_ONLY (default value) Only the time-series for which data or flag exists are returned in the output. |
The following rules apply:
- Multiple values for a parameter must be separated using a comma (
,
). - Default values do not need to be supplied if they are the last element in the path.
- Operators can be used to refine the applicability of the
c
query parameter:
Operator | Meaning | Note |
---|---|---|
eq | Equals | Default if no operator is specified and there is only one value (e.g. c[FREQ]=M is equivalent to c[FREQ]=eq:M ) |
ne | Not equal to | |
lt | Less than | |
le | Less than or equal to | |
gt | Greater than | |
ge | Greater than or equal to | |
co | Contains | |
nc | Does not contain | |
sw | Starts with | |
ew | Ends with |
Operators appear as prefix to the component value(s) and are separated from it by a :
(e.g. c[TIME_PERIOD]=ge:2020-01+le:2020-12
).
Response types
The following media types can be used with data queries:
- application/vnd.sdmx.data+xml;version=3.0.0
- application/vnd.sdmx.data+csv;version=2.0.0;labels=[id|name|both];timeFormat=[original|normalized];keys=[none|obs|series|both]
- application/vnd.sdmx.genericdata+xml;version=2.1
- application/vnd.sdmx.structurespecificdata+xml;version=2.1
- application/vnd.sdmx.data+csv;version=1.0.0;labels=[id|both];timeFormat=[original|normalized]
The default format is highlighted in bold.
SDMX-CSV offers the possibility to set the value for two parameters via the media-type. These parameters are label
and timeFormat
; both are optional. The default values for these parameters are marked with * in the above media-type (i.e. id
and original
respectively).
Key parameter
The key parameter defines values of the dimensions in the order of structure.
The component "c" parameter can be used to additionally define filters on Dimensions on top of key parameter.
The c parameter does not filter on attributes, neither measures values.
Key parameter supports wildcard "*"
Wildcard "*" means that no filtering on the dimension is applies.
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0/*.*.*.DE
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0/*.*.T2020_20.DE
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0/*.PC_GDP.T2020_20.DE
Partial key definition (omitting the last dimension(s)) is supported
Only the last dimension position(s) can be omitted. E.g.
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0/A.PC_GDP.T2020_20.DE
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0/A.PC_GDP.T2020_20
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0/A.PC_GDP
Attributes and measures parameter
SDMX 3.0 parameters attributes and measures combinations can be named according to the previously known SDMX 2.1 Data detail parameter
attributes \ measures | no value / "all" | "none" |
---|---|---|
no value / "dsd" / "all" | supported (equivalent to detail = "full") | not supported |
"series" | not supported | supported (equivalent to detail = "nodata") |
"obs" | not supported | not supported |
"none" |
supported (equivalent to detail = "dataonly") |
supported (equivalent to detail = "serieskeysonly") |
Request FULL
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0?attributes=all&measures=all
Accept: application/vnd.sdmx.data+csv; version=2.0.0
All data is returned (default)
STRUCTURE,STRUCTURE_ID,freq,unit,indic_eu,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG,TARGET_FLAG,TARGET
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2016,3.12,e,,3.76
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2017,3.06,,,3.76
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2018,3.09,e,,3.76
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2019,3.13,,,3.76
Request DataOnly
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0?attributes=none&measures=all
Accept: application/vnd.sdmx.data+csv; version=2.0.0
The observations (OBS_VAL only) are returned for each series, but not the attributes
STRUCTURE,STRUCTURE_ID,freq,unit,indic_eu,geo,TIME_PERIOD,OBS_VALUE
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2016,3.12
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2017,3.06
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2018,3.09
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT,2019,3.13
Request NoData
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0?attributes=series&measures=none
Accept: application/vnd.sdmx.data+csv; version=2.0.0
Only the series level attributes are returned for each series
DATAFLOW,LAST UPDATE,freq,unit,indic_eu,geo,TARGET_FLAG,TARGET
ESTAT:T2020_20(1.0),14/12/21 23:00:00,A,PC_GDP,T2020_20,AT,,3.76
ESTAT:T2020_20(1.0),14/12/21 23:00:00,A,PC_GDP,T2020_20,BA,,
ESTAT:T2020_20(1.0),14/12/21 23:00:00,A,PC_GDP,T2020_20,BE,,3
ESTAT:T2020_20(1.0),14/12/21 23:00:00,A,PC_GDP,T2020_20,BG,,1.5
Request SeriesKeysOnly
protocol://ws-entry-point/sdmx/3.0/data/dataflow/ESTAT/T2020_20/1.0?attributes=none&measures=none
No data or attributes are returned for each series, only the series key
STRUCTURE,STRUCTURE_ID,freq,unit,indic_eu,geo
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,AT
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,BA
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,BE
dataflow,ESTAT:T2020_20(1.0),A,PC_GDP,T2020_20,BG
Return data parameter
In SDMX, the dataset boundaries can be expressed as DataConstraint describing the available positions for each dimension. It means that a position is used at least once in a data time-series.
This listing of positions for each dimension is used as the basis when building a data result for a received query.
For example, supposing a dataset exposing yearly since 2000 where data for Croatia is available only since 2004. The following query in TSV:
GEO = "Croatia" And TIME_PERIOD >= 1990 and TIME_PERIOD <= 2006
is returning a TSV file containing the (empty) columns 2000, 2001, 2002 and 2003 in addition to the columns 2004, 2005 and 2006.
Also, to be noted that in some specific filtered results some positions could be present while no actual data being present for them.
This information is encoded in the JSONSTAT format extension "positions-with-no-data". To retake the above example, the positions 2000, 2001, 2002 and 2003 for the TIME_PERIOD dimension are represented as positions which did not match any data.
This is enough to understand the boundaries defined for cube-oriented JSONSTAT format, however TSV as a time-series oriented format has the specific feature to not include lines for non-existing time-series.
For example, the following query
Returns:
Only the existing time-series are actually present in the output thus this TSV result contains only 3 lines because the time-series c[FREQ]=A&c[INDIC_DE]=POPTRT&c[GEO]=FX does not exist.
This response is the default one to reduce the response size and be more accurate. It is expected from API client to be in power to process such results in general as it is how Eurostat is offering its content via the heavily used Bulk Download facilities.
In case this behavior proves itself blocking to some client applications, an extra parameter &returnData=ALL could be used to let the TSV contains these expected "missing" lines.
This option is also available for the SDMX-CSV format.