Statistical disclosure control (SDC ), or Statistical Disclosure Limitation (SDL ) as it is also called, is an activity aimed at the protection of data that are to be released by an NSI . Protection means that individual entities (such as businesses) are not (readily) identified, and more particularly, confidential or sensitive information about such entities is not released to third parties. This to prevent misuse of data intended for statistical purposes. Instead of focusing on aggregates, the attention is directed at individual entities and their response, or the information that is available from them. This shift of attention may even be inadvertent, because certain aggregates happen to consist of one, or a few, entities of which one dominates the contribution.
The aim of SDC is twofold: to identify the risks involved in releasing data, and secondly to modify ‘risky data’ in such a way that for the resulting data the disclosure risk is negligible. The challenge in modifying the data is to do it in such a way that no (possibly sensitive) information about individual entities is disclosed, directly or indirectly, whereas the protected data are still of interest for statistical research and policy studies. The aim of SDC is not to hamper statistics, but to hamper non-statistical use of the data, such as ‘unearthing’ information on certain individuals. Statistics is not about individuals but about groups of individuals. So there is room to protect the privacy of individuals whilst serving the interests of society to provide it with statistical information, for research, policy making or general interest. Note that, in the context of this handbook, individuals usually mean individual businesses.
In case of business statistics, tables are the usual pieces of information that are released to users outside statistical institutes. Business populations are usually too skewed so that safe release of business data in microdata form is usually not possible for public use: large units cannot be protected, without rendering the microdata useless. In some countries it may be possible to allow researchers from bona fide institutes to have access to microdata, under strict conditions, and/or in safe settings. But the final results of this research are also in the form of aggregates, such as tables. So in practice, disclosure control of tables is more of an issue for business data than is the protection of microdata. For that reason the focus of attention in the present module is on the SDC of tables.
For tabular data the first task in protecting them is to define rules that separate safe from unsafe data. Once these rules have been specified they can be applied to the tables at hand. In case cells (in tables) have been found that are considered unsafe according to the rules applied, the next thing to do is to try to eliminate them by modifying the tables. For this a range of techniques is available. The problem is to apply them to the tables, in such a way that the resulting tables are safe (according to the rules that have to be considered) and the modification of the tables is minimal. For microdata a similar problem exists, but that will not be highlighted in the present module, for the aforementioned reasons.
For more detailed information about Statistical Disclosure Control issues, we refer to Hundepool et al. (2012), Hundepool and De Wolf (2011), Willenborg and De Waal (2001) and Willenborg and De Waal (1996).
To read the entire document, please access the pdf file (link under "Related Documents" on the right-hand-side of this page).
Your feedback is appreciated. Please send your remarks, suggestions for improvement, etc. to email@example.com.