Statistics Explained

Archive:EU statistics on income and living conditions (EU-SILC) methodology – sampling

Revision as of 16:13, 3 March 2015 by Megliem (talk | contribs)
PAGE UNDER CONSTRUCTION !!!
Data extracted in Month YYYY. Most recent data: Further Eurostat information, Main tables and Database. Planned article update: (dd) Month YYYY(, hh:00).

This article is part of the Eurostat online publication EU statistics on income and living conditions (EU-SILC) methodology, which offers a description of the dimensions along with the EU-SILC indicators are disseminated in Eurostat's dissemination database within the overall domain of Income and living conditions. Since the indicators are of multidimensional structure are analysed simultaneously along several dimensions, and published in separate datasets along with different combinations of dimensions. The articles presents these dimensions and provides information on their methodological limitations.

Introduction

EU SILC is a sample survey. The legislation specifies that data shall be based on nationally representative probability samples and prescribes minimum effective sample sizes, but leaves to the country the choice of a specific sampling design. This articles describes the main characteristics of the sampling design and the sampling error for the main EU-SILC indicator: AROPE.

Sampling frames

The big strength of EU-SILC is the usage of the best sampling frames available in each National Statistical Institute (NSI). According to the framework, data are to be based on a nationally representative probability sample of the population residing in private households within the country, irrespective of language, nationality or legal residence status. All private households and all persons aged 16 and over within the household are eligible for the operation. Persons living in collective households and in institutions are generally excluded from the target population. The sampling frame as well as methods of sample selection should ensure that every individual and household in the target population is assigned a known probability of selection that is not zero. As shown in Table 1, the vast majority of countries used for the 2011 EU-SILC operation population registers, or national census or a master sample derived from the census.

Sampling design

The sample design describes all the steps to be carried out when selecting a sample of households or persons. It aims to improve the quality of the estimates produced and to control costs. Various strategies are in place in different countries to achieve this objective. The table below summarizes the sampling design used in each country for the 2011 operation. Countries choose a specific sampling design according to the structure of the country and the population, according to existing information and taking into account budgetary constraints. The most used sampling design is stratified multistage sampling. Only five countries do not use stratification criteria to draw their sample. In details: Malta, Denmark and Island use a simple random sample design and Sweden as well as Norway use a systematic sample. Concerning all the remaining countries, they apply one or more stratification criteria, mainly a geographical stratification. Among them, the majority uses a multi-stage sampling with the exception of Luxembourg, Germany, Cyprus, Slovakia, Switzerland, Austria and Lithuania which use a stratified simple random sample. Estonia uses a systematic stratified sample and Hungary is the only country to apply a different sampling design for drawing each rotational group. Countries send every year to Eurostat general information on the sampling design used and detailed information at the level of microdata on the strata and PSU from which each household is drawn. The efficiency of the sampling design has a big impact on standard error and should be monitored over time. On the other side, changing it is extremely costly.

Standard error estimation

Given the high policy relevance of EU-SILC there is increasing demand from the stakeholders for accuracy measures of the published indicators and for measures of the significance of net change of indicators over time for correct monitoring of the evolution of social exclusion phenomena. As seen, EU-SILC is a complex survey involving different sampling design in different countries. For this reason, "to the book" standard methods for calculating accuracy measures are not directly applicable. Eurostat with the substantial contribution of Net-SILC2 has put in place a simple method for standard error estimation based on linearization and coupled with the ultimate cluster approach.

Linearization, some background

Suppose we wish to estimate , where is the value of a study variable for . can be either continuous, in which case is the sum of all values of over the population (e.g., total household income) or dichotomous (e.g., 1 if the person is unemployed, 0 otherwise). If is a dummy variable, refers to the total number of units which fall in the underlying category (e.g., total number of unemployed persons in the population). Let be an estimator of , for which an estimate of the standard error is wanted. The variance estimator of is given by:

     (1)

• h is the stratum number, with a total of H strata • i is the primary sampling unit (PSU) number within stratum h, with a total of nh PSUs. We assume nh  2 for all h. • j is the household number within PSU i of stratum h, with a total of mhi households • hij is the sampling weight for household j in PSU i of stratum h • and

The variance formula (1) applies to linear indicators, i.e. means, totals and proportions. However, most of the EU-SILC key indicators are non-linear (e.g., the median income or the Gini coefficient). In order to estimate the variance of non-linear statistics, the linearisation method may be used (Deville 1999, Osier 2009). The principle is to reduce non-linear statistics to a linear form by retaining only the first-order term in an infinite Taylor-like series, thus getting a linear function of the sample observations As we know how to estimate variances of linear functions of means and totals, the variance of the linear approximation can be calculated and used as an approximation of the variance of the non-linear statistic. The linearisation procedure is justified on the basis of asymptotic properties of large samples and populations.

Assuming  is a complex non-linear parameter, the variance of an estimator follows the same expression as (1), except that the study variable is replaced by the “linearised” variable :

     (2)

For instance, if is the ratio of two population totals, then we have for all k.

The ultimate cluster approach is a simplification consisting in calculating the variance taking into account only variation among Primary Sampling Unit (PSU) totals. This method requires first stage sampling fractions to be small which is nearly always the case. This method allows a great flexibility and simplifies the calculations of variances. It can also be generalized to calculate variance of the differences of one year to another.

Results

We have applied the method for estimating the standard error and confidence intervals on the indicator AROPE (At-risk-of poverty or social exclusion). This indicator is the proportion of persons being in one or more of the three following situations: at-risk-of poverty, i.e. below the national poverty threshold (60% of median national equivalized income), severely materially deprived, living in a household with very low work intensity. We have considered this indicator as a proportion making the assumption that the poverty threshold is a fixed amount and equal to the point estimate. According to the characteristics and availability of data for different countries we have used different variables to specify strata and cluster information. In particular, countries have been split into three groups:

1) BE, BG, CZ, IE, EL, ES, FR, IT, LV, HU, NL, PL, PT, RO, SI, UK and HR whose sampling design could be assimilated to a two stage stratified type we used DB050 (primary strata) for strata specification and DB060 (Primary Sampling Unit) for cluster specification;

2) DE, EE, CY, LT, LU, AT, SK, FI, CH whose sampling design could be assimilated to a one stage stratified type we used DB050 for strata specification and DB030 (household ID) for cluster specification;

3) DK, MT, SE, IS, NO, whose sampling design could be assimilated to a simple random sampling, we used DB030 for cluster specification and no strata.

The approach used can take account of stratification, multi-stage selection, unequal probabilities of inclusion for the sample units and re-weighting for unit non-response. However it does not reflect the gain in accuracy caused by calibration weighting. The effect of calibration on variance could be significant especially in the countries where powerful auxiliary information from income registers has been used to adjust the sampling weights. This in some cases may lead to overestimation of sampling errors. In addition to that, the value of the indicators may be different form the values published on Eurostat website due to data revisions. Results are shown in Table 2 and demonstrate the overall good accuracy of SILC data. The survey has in fact been designed to yield a 95% confidence interval of around 1 percentage point around an hypothetical poverty rate of 15%.

The same approach has been used to calculate variance of net change over two consecutive years. In order to monitor the process towards agreed policy goals, particularly in the context of the Europe 2020 strategy, users are particularly interested in the evolution of social indicators. However, interpreting differences between point estimates at different wave may be misleading. It is therefore necessary to estimate the standard error for these differences in order to judge whether or not the observed differences are statistically significant.

Estimated standard errors and confidence intervals (based on normality assumption) for net changes in the AROPE between 2009 and 2010 are shown in Table 3. If a confidence interval does not include 0, we can say the difference in the AROPE between 2009 and 2010 is statistically significant (at a given level of confidence).

See also

Further Eurostat information

Publications

Main tables

Database

  • Living conditions and welfare (livcon), see:
Income and living conditions (ilc)
People at risk of poverty or social exclusion (Europe 2020 strategy) (ilc_pe)
Main indicator - Europe 2020 target on poverty and social exclusion (ilc_peps)

Dedicated section

Income and living conditions (ilc)

Methodology / Metadata

Notes



[[Category:<Background article>|EU statistics on income and living conditions (EU-SILC) methodology - sampling errors]] [[Category:<Living conditions>|EU statistics on income and living conditions (EU-SILC) methodology - sampling errors ]]