Back to top

Census 2021 round (cens_21)

DownloadPrint

National Reference Metadata in Euro SDMX Metadata Structure (ESMS)

Compiling agency: Istat - Italian National Institute of Statistcs

Need help? Contact the Eurostat user support

The data set transmitted to Eurostat include all the hypercubes requested according to EU Regulation (EC) 763/2008 and the following implementing regulations: Regulation (EU) 2017/543; Regulation (EU) 2017/712 and Regulation (EU) 2017/881.

All the definitions and classifications are compliant with the abovementioned Regulations. The few exceptions are detailed under the related topics.

17 December 2024

The information is given separately for each census topic.

The EU programme for the 2021 population and housing censuses includes data on persons, private households, family nuclei, conventional dwellings and living quarters.

The persons enumerated in the 2021 census are those who were usually resident on the Italian territory at the census reference date.

Data are available at different levels of geographical detail in EU countries: national, NUTS2/NUTS3 regions and local administrative units (LAU), grids.

See the following sub-concepts.

There are no particular reasons for census data unreliability.

With reference to the usual residence, the signs of life methodology has been used to estimate the population register over and under-coverage, while a measurement of the error (mean squared error) of the register-based count has been conducetd through the field surveys.

For more details on the methodology used to estimate the data on the different topics, see the following annexes:

  1. ANNEX attached in section 18.5;
  2. ANNEX attached in section 19 (Sheet 2. "Data Sources").

Counts of statistical units should be expressed in numbers and where is needed rate per inhabitants enumerated in the country.

Data capturing and Coding
Concerning data capture for survey data, the data collection is totally paperless (no paper questionnaires used). The coding of territorial variables is performed directly during enumeration (no open text answers are collected; the respondents select the relevant answer through drop-down menus). 

Concerning register data, all the administrative sources received by Istat, before being used for any production process, are processed in order to assign to each record present in the source a SIM code. After the pseudonymisation takes place, the SIM output is made available to internal users that will use it, within their statistical processes, to produce the outputs (i.e. by linking the different sources through the SIM code.)

The SIM code is the only varibale used for linking the different sources for persons data. Another code used is the address identifier (CUI) used both for the surveys sampling frame and for the production of census data at sub-municipal level. Details on the assignment of the SIM code have been provided in section 18.3.2.

 

Use of signs of life (SoL) to estimate the population at LAU2 level
The SoL method is applied to implement the usual residence definition according to Regulation N. 763/2008. All the sources listed under 18.1.3 (organised in a smaller number of statistical registers) are being used to estimate the actual presence of a person at their registered address i.e. to correct the Population Register (ANPR according to the Italian acronym, which is the basis of the census count) through the use of classification criteria applied to individual records in statistical registers. For details see section 18.6.

 

Data compilation: pre-processing, de-duplication, editing and imputation
The data compilation operations concern only the A-survey and L-survey sample data collected in the field; only the household type and nucleus variables were determined following the Editing and Imputation process of the data from the Population Register of Individuals, enriched with information from ANPR (National Register of the Resident Population).

Data pre-processing
As the surveys are paperless, the information provided by respondents or surveyors is automatically recorded in the 'Acquisition System'.

The Acquisition System contains all data compilations including partial compilations saved by respondents before completing and sending the questionnaire; therefore, the first operations performed in the "Production System" are to identify, for each questionnaire code, the one containing the most information.

Individuals belonging to households that have fully or partially completed (partial non-response) the questionnaire are subjected to the SIM-code attribution processes for the anonymisation of individual records.

De-duplication phase
In the de-duplication phase, duplicate questionnaires and duplicate individuals are identified, within the same survey type and regardless of location. These duplicates are retained in the first version of the data tables (a version that can always be consulted) but are not reflected in the individual data tables, which move on to the next stages (check and correction; estimation). 

Cases of individuals detected with both A and L are not considered duplicates, so in the individual data table the same individual can be present twice but in different surveys: this situation is handled in the integration processes with the register data, when one has to be 'chosen' (in 2021 the record from the A-survey was chosen).

The anonymised (from pseudonymisation process by SIM code) and de-duplicated data in the Production System undergo the Editing and Imputation procedures jointly for the data of the two surveys (A and L).

Editing and Imputation (E&I) of individual variables defining the legal population
The first phase of the E&I process focuses on the individual variables that define the structure of the population (gender; date of birth; citizenship - Italian/foreigner;) and determine the paths to filling in the questionnaire.
The methodology applied provides for concordance checks between the variables surveyed, those in Population Register and those in the stock data coming from ANPR (National Register of the Resident Population); data for which discrepancies are found for at least one of the variables considered, are corrected by deterministic imputation algorithms using, where possible, other variables from the questionnaire and auxiliary variables specifically implemented also in order to keep the information of individuals within the same household congruent.
Once these variables have been fixed more or less definitively, the E&I processes continue in parallel for different subject areas of the questionnaire.

Subject area A: E&I of demographic and family variables
In the first step we proceed to the E&I of the variables with congruence constraints between individuals belonging to each household: sex, age, relationship, marital status, year of marriage, marital status before last marriage, foreign citizenship status, etc. in order to restore congruence between them. The E&I methodology provides for the following steps to be carried out cyclically until an empty reservoir of errata is obtained:

a) execution of the Family Procedure (see below "The Household Procedure");
b) editing runs to detect families with inconsistent data; deterministic or manual corrections are performed on these data;
c) execution of the Families Procedure on erroneous families only;
d) possible reiteration from point a).

Since the Households Procedure checks and corrects the information and at the same time assigns the variables defining the family type and the households in a deterministic manner, at the end of the cyclical process a validation of the aggregated data is carried out and, if necessary, targeted corrections are made leading to new correction cycles.
In the second phase, the other individual variables on citizenship and residence are corrected by identifying, via compatibility rules, the incorrect individuals and correcting them using deterministic, probabilistic or manual methods.
The household type and nucleus variables (26 in all) were not estimated on the basis of the sample data of the permanent population census, but are the result of a complex and innovative E&I process carried out using data from the Population Register enriched with information from the ANPR stock data.

Subject area B: E&I of socio-economic and commuting variables.
These E&I processes first involve checking and correcting the core variables using a methodology that can be schematised according to the following steps:

a) internal compatibility checking of responses by blocks of variables in the questionnaire to identify erroneous data;
b) verification of the answers by comparison with benchmark information (micro and macro) present on Istat's thematic registers (Integrated Bases of Educational Qualifications; Labour Register) or with internal information sources (origin-destination matrices for commuting);
c) correction of erroneous data with deterministic, probabilistic or donor methods.

The correction of variable values is then followed by a validation of the aggregated data which, if necessary, determines targeted corrections on which new correction cycles can be performed.

Next, the E&I of the no-core variables is carried out by means of compatibility rules with the variables corrected in the first step and by using various methods for correcting missing, anomalous or incorrect data; deterministic or probabilistic imputation methods are then applied to the latter data.

Subject area C: E&I of the housing variables
For the correction of the variables in the Housing section of the questionnaire, we first identified the cluster of households co-habiting households as households co-habiting in the same accommodation have to report the same information regarding the characteristics of the accommodation. The identification of clusters of co-habiting households has concerned the questionnaires collected by the Area survey. In the case of several households co-habiting in the same dwelling, each household was assigned a questionnaire. For each housing unit with several co-habiting households, a 'father' household was identified. For the households co-habiting with the 'father' unit, the surveyor indicated in the ID_UNIT_PADRE field, the questionnaire code (CODQUEST) of the household "father" in order to obtain the unique identification of the households that cohabited in the same dwelling. Summarising the E&I strategy for co-habiting households involves the following steps:

  1. Identification of co-habiting clusters:
  2. Correction of co-habiting variables (COAB, NFAM, NOCC);
  3. Correction of co-housing variables.

Having completed the activity of identifying co-habiting households and having made the information on the respective dwellings consistent, the proceeded with the correction of the housing variables. The first step was to check for violated rules and then impute the values by probabilistic method respecting the distribution of the values in the subset of exact records and for each variable the strata, were strata within which the imputation was to be carried out were identified.

Deletion of Records
In general, the E&I procedures provided for the deletion of records only in cases of clearly erroneous data due to 'apparent' (invented or randomly produced) compilations. The records entirely deleted for the sample surveys supporting the Italina PHC 2021 were very few in number.

The Family Procedure (FP)
The Family Procedure is a software implemented in Istat, by computer scientists and statisticians, for the control and correction of data sample data of social surveys and adapted to the needs of the permanent population census for the editing and imputation of family registry variables (relationships, marital status, year of marriage). It uses complex control algorithms that are based on: the determination of potential couples within the household of the family (with a system of scores given to all combinations of pairs of individuals in the family) and on combinations of variable strings then proceeding to deterministic correction. Having corrected the family registry variables determines all the variables defining the family type and nuclei.

 

Methodologies adopted for the estimation of census topics

See sub-section "Methodologies used for any estimations, models or imputations" in section 18.3.3.

For more datails on estimation methods adopted in 2021 PHC in Italy see the ANNEX attached in this section.

Annexes:
Information on estimation methods adopted in 2021 PHC in Italy

At the core of the Italian Permanent Population and Housing Census (PPHC) is the Population Register (PR). Together with the Statistical Base Register of Addresses (RSBL) and with the thematic registers on education and employment, PR provides the basis for the production of population census data in a combined census design (census data are produced by using multiple sources).

Sampling data

Two ad hoc sample surveys (Area survey and List survey) are conducted annually for the quality measurement of the fully register-based population count estimation and to collect data for not replaceable (or only partially replaceable) variables.

Therefore, concerning the census outputs, the PPHC produces a:

  • fully register-based population count;
  • census hypercubes estimated by the joint use of information already available in registers and of data collected on the field, through the use of statistical models.

In the first cycle (2018-2021) of the PPHC, two ad hoc surveys are conducted annually in self-representative municipalities (i.e. those with a population over 17,800 inhabitants and smaller ones which do not rotate in the sampling scheme of the Labour Force Survey) and every four years, according to a rotation scheme, in non-self-representative municipalities (i.e. all the others). In each municipality, a sample of households is selected from the Population Register for the List survey and a sample of addresses from the Address Register for the Area survey.

Every year are involved in the surveys about 2850 municipalities for a total of 1,400,000 households (of which 950,000 for the List survey and 450,000 for the Area survey). In 2021 the number of municipalities involved was higher (4531 out of the total 7903 municipalities in Italy) as the number of non-representative municipalities was double than the number originally planned (in 2020 the fields surveys were canceled due to the pandemic therefore the municipalities due to participate in 2020 were 'moved' to 2021). Therefore the households involved in the 2021 surveys were respectively 2,472,400 for the L survey and 776,097 for the A survey. The reference population is the population usually resident in Italy.

Administrative data

With reference to the administrative data sources used to produce the census data - either through an integration process or an estimation model - it is suggested to see:

  1. ANNEX attached in section 18.1.3 (List of data sources per topic);
  2. ANNEX attached in section 19 (Sheet 3. "Administrative data sources").

The metadata provided here are referred mainly to the 2021 wave of the Permanent Population and Housing Census (end of the first cycle of the PPHC) but the frequency of dissemination is annual (the Permanent Census produces and releases yearly data). 

The time lag between census reference date and the first release of data for the basic socio-demographic characteristics is of one year and currently cannot be reduced, taking into account the time necessary for processing both survey and administrative data.

Data on the population by sex, age, citizenship, and education level by municipality were released about one year after the census reference date (15 December 2022). Data on current activity status and on households (no. of households by no. of components) were released in February 2023. The same data at enumeration area level were released in June 2023. 

The 15 December 2022 release included data on the hard-to-reach populations (institutional population, homeless and people living in formal/informal camps). Data on conventional dwellings (no. by occupancy status) was released in March 2023. In December 2023 were released data on the population by migratory background. The remaining data will be released in the course of 2024, after trasmission of census hypercubes to Eurostat (31 March 2024).

No preliminary data were released.

The data are comparable across all the different geographical areas.

With regard to grid data, the definitions, concepts and classifications adopted are all in line with the provisions of Implementing Regulation 2018/1799.

The data are therefore fully comparable at European level.

The A and L sample surveys were conducted with the same methodology throughout the territory. The data were produced with the same methodology and using nationally available sources. 

The data are therefore also fully comparable between Italian regions.

Not applicable.