"Standard Code Lists" project
Following a decision taken by Eurostat Directors' meeting in December 2007, and confirmed in 2009, Unit B5 "Central Data and Metadata Services" is in charge of the harmonisation of statistical code lists in the whole of Eurostat. Background, process and rules are explained below.
What are statistical code lists ?
Statistical code lists are structural metadata on statistical concepts which are in general used to describe dimensions in data management. For multi-dimensional tables on the Eurostat webpage, in general several of these code lists are used. When appropriate, these lists are based on official statistical classifications, such as Nace Rev. 2 and ISCO or other code lists related to sex, age, time, working status, etc. In total, around 500 code lists exist in the Reference database.
A particular standard code list which needs to be defined clearly is the one concerning flags; flags are directly linked to data values and give supplementary information about the observation. They are represented by a code next to the actual value.
Why have they to be harmonised ?
At present, the great majority of the code lists used in the Eurostat Reference and production databases are not harmonised. This means that sometimes different codes are used for the same statistical concept (e.g. for Manufacturing industry the codes "RD", "B0200", "SE0_4" and "TOT_MANUF" are used in four different production databases, while the standard code for this NACE section is "D" in the Reference).
For both data producers and data users this situation is not very satisfactory, as it leads to extra work and is a permanent source of errors. Unit B5 is therefore embarking on the drawing-up, release and management of standard statistical code lists; the homogeneous use of standard code lists in all statistical domains and all the different stages of the CVD will facilitate data management and exchange.
What are the general rules for harmonising code lists ?
Several criteria are applied when harmonising statistical code lists. These are:
- They are based on official statistical classifications or widely used standards as much as possible;
- Only alphanumeric characters plus "-" (dash) and "_" (underscore) are used in order to avoid all problems related to their use in different software applications; "-" is used to define intervals, "_" for aggregation of two codes;
- Preferably, an acronym of each code list are used as a prefix before the numerical codes to avoid problems with 'leading zeroes';
- Additional codes for aggregates are inserted into the standard code lists whenever these aggregates are needed for data production and dissemination;
- The code-lists must be directly usable and cover the variety of a reference database.
Who does what ?
Based on the work already done in the production units and in close co-operation with the domain managers, Unit B5 draws up and maintains the standard code lists. In the coming years an increasing number of these code lists will be finalised and disseminated.
New and additional codes should be forwarded to Unit B5 who will then include them in the next regular revision of the respective code list if the request is justified.
Unit B5 will disseminate these lists under Ramon, the Eurostat Metadata server where classifications, concepts and definitions as well as glossaries are already published. This means that the standard code lists will also be available for external users. In the future, following the development of new applications in Eurostat, the standard code lists will be uploaded in the Metadata Handler/SDMX registry.
How should these standard code lists be used ?
The harmonised code lists are/will be used in the Eurostat Reference database (Eurobase) and in the Dissemination environment. In addition to this use, domain managers in Eurostat should use them whenever possible in their production databases and in their data transmission formats. Even if sometimes no immediate implementation is possible at the production database level, domain managers are asked to adapt whenever the opportunity arises. The adoption of the harmonised codes into the data production chain would reduce the need for transcodification and the risk of errors.
These harmonised codes could also be extended to Member States either directly or whenever they participate in the data transmission formats defined by Eurostat.
What is next ?
The task of maintaining standard code lists starts once these lists are published. Unit B5 will publish additional standard code lists as soon as they are agreed with the domain managers concerned.
A medium term task of Unit B5 will be to explore the possibility of harmonising the standard code lists or at least the concepts contained in the lists at international level. Work has started within the SDMX (Statistical Data and Metadata eXchange) sponsoring organisations, where the concepts contained in a number of code lists are to be harmonised at international level.
Media through which available: RAMON
The standard code lists are available in English, French and German; description of aggregates is available only in English.
Address of responsible agency
Statistical Office of the European Union
Standard code list team
E-mail: ESS Standard Code Lists