This domain of statistics defines principles, concepts and procedures that keep data confidential while still permitting its use for statistical purposes.
Statistical confidentiality is a fundamental principle of European statistics. EU Regulation 223/2009 on European statistics defines confidential data as:
"…data which allow statistical units to be identified, either directly or indirectly thereby disclosing individual information".
Direct identification means identification of the respondent (statistical unit) from their formal identifiers (name, address, identification number).
Indirect identification means inferring a respondent's identity by a combination of variables or characteristics (e.g. age, gender, education etc).
Individual data collected by statistical offices for statistical compilation, whether they refer to natural or legal persons, has to be strictly confidential and used exclusively for statistical purposes.
Statistical disclosure control (SDC) methods
Statistical confidentiality is ensured through:
- physical protection - the data is securely stored and not accessible to anyone without explicit authorisation.
- statistical disclosure control (SDC) - methods for reducing the risk that statistical units are identified when the statistical data is being published, including:
- tabular data protection – for aggregate information on respondents presented in tables (using suppression, rounding and interval publication)
- microdata protection – for information on statistical units (using local suppression, sampling, global recoding, top and bottom coding, rounding, rank swapping and microaggregation).
Access to confidential data for scientific purposes
At EU level, access to confidential data (microdata) for scientific purposes is the only exception to the rule that confidential data can only be used to produce European statistics.
EU legislation in this field is currently being revised to expand the available datasets and make access to them simpler, including from multiple points.