Collection and Use of Secondary Data (Theme)


National Statistical Institutes (NSIs) aim to produce undisputed and up-to-date statistics about their society. This requires up-to-date and reliable data. These could be data that the organization itself collects (primary data) or data that are available in the outside world (secondary data). The latter can, for instance, be administrative sources maintained by other governmental organisations, and sources nowadays identified as ‘Big data’, such as data available on the internet and data generated by sensors. Mindful of the costs and response burden involved in the collection of primary data, more and more NSIs aim to maximize the use of secondary data for statistics production. The entire process of collecting already existing data is generally referred to as the collection of secondary data. This chapter discusses the advantages and disadvantages of this approach from an official statistics point of view.

In order to be in a position to use data from secondary sources, NSIs need to know which secondary sources exist with respect to their country and if they are allowed access them on a regular basis. Next, the ‘fitness for use’ of the data source for official statistics needs to be determined. There are many ways to determine this. The most important approaches focus on the metadata quality of the source, on the data quality of the input data, and on the data quality of the statistics produced. When a secondary data source is found suited for use, delivery agreements with the data provider need to be set up. It is considered good practice to assign an NSI -employee as the contact person for the source and the data provider. For important statistics that are dependent on the availability of the secondary data, ways to deal with any interruption or delay in the delivery need to be set up. These so-called fall-back scenarios may range from very simple actions, such as directly contacting the data provider, to the use of complex models that are able to cope with any data missing.

Apart from administrative data, some more recent work also focuses on the use of innovative secondary sources, so-called Big data, for statistics. Since a lot of these projects are still going on and these sources are not used for statistics yet, the focus of this chapter is limited to what is already known on the use of secondary sources for statistics.


