Weighting and Estimation - Main Module (Theme)


The present module gives an overview of the methods that can be used to obtain estimates for parameters such as totals, means or ratios, from the observed sample data. It is assumed that data have already been processed to treat potential errors and item non-response (see the modules “Statistical Data Editing – Main Module” and “Imputation – Main Module” for introduction to treatment of errors and item non-response).

Commonly, in official statistics, probability-based sampling designs are carried out, and a design weight can be associated to each sampled unit. This design weight equals the inverse of the inclusion probability. It can be thought as the number of population units each sample unit is representative of. Hence, a simple method to obtain estimates of the target parameters is to use these design weights to inflate the sample observations (see subsection 2.1). Design weights are strictly related to sampling design implemented for the survey (see the module “Sample Selection – Main Module”). Moreover, design weights can be adjusted also to consider non-response (see subsection 2.2), and/or they can be modified to take into account of auxiliary information (Särndal et al., 1992). An example of use of external information is given by the calibration estimator (see the module “Weighting and Estimation – Calibration”) or the GREG estimator (see the module “Weighting and Estimation – Generalised Regression Estimator”), which is a special case of calibration estimator.

The previous estimators are unbiased or approximately unbiased in a randomisation approach (or design-based approach: properties are assessed on the set of all possible samples). Note that even if, in some cases a model is assumed (as for GREG), the properties of the estimators do not depend on the model and the estimators remain design unbiased even in case of model failure. For this reason, this class of methods is robust. However, their efficiency depends strongly on model assumptions and relationships on auxiliary variables affect their variances.

In fact, when the distribution of the target variable in the population is highly skewed, as it often happens in business surveys, representative outliers may occur in the sample. The values of such units are true values and then they do not need to be edited (see the topic “Statistical Data Editing”). Nevertheless, even if estimators remain unbiased, presence of these outlying units has a large impact on variance estimators. The module “Weighting and Estimation – Outlier Treatment” gives an overview of methods that have been suggested in literature for reducing variance of the estimates, while controlling for the presence of bias.

A relevant approach for estimation is given by model-based approach: differently from design-based approach, where, as stated above, properties are assessed on the set of all possible samples, in this framework, the assumption of a model is the basis to obtain estimators that are the best in terms of model Mean Square Error: Best Linear Unbiased Predictor (Royall, 1970, Vaillant et al., 2000). In official statistics, the class of model-based estimator is applied in specific situations, such as when the sample size is not large enough to obtain estimates with sufficient accuracy (small area estimators, see also the module “Weighting and Estimation – Small Area Estimation”). A second important field of application of model-based estimation is given by preliminary estimation, when for short term statistics a provisional estimate is calculated on a sub-sample of the sample units. The auto-selection of units in the preliminary sample may be the most relevant issue for preliminary estimates. Moreover, when the sample is selected with a non-probabilistic mechanism, model-based estimates can be applied for inference, and model-based variance can be evaluated.

The peculiarity of panel surveys is also highlighted. In panel surveys, the same units are observed in several occasions (waves), allowing for reduction of estimators' variance and estimation of longitudinal parameters (e.g., gross change and measure of frequency). Cross-sectional and longitudinal weights have to be determined according to the target parameters (see subsection 2.6).

Finally, the use of administrative data is mentioned in subsection 2.9.

To conclude the review of relevant issues in weighting and estimation, subsection 2.10 underlines some of the most typical matters in applied cases.


To read the entire document, please access the pdf file (link under "Related Documents" on the right-hand-side of this page).


Your feedback is appreciated. Please send your remarks, suggestions for improvement, etc. to memobust@cbs.nl.