1.1. Contact organisation
Istat - Italian National Institute of Statistcs
1.2. Contact organisation unit
Division for population census and social survey integration
1.3. Contact name
Confidential because of GDPR
1.4. Contact person function
Confidential because of GDPR
1.5. Contact mail address
Piazza Guglielmo Marconi, 24 - Rome, Italy
1.6. Contact email address
Confidential because of GDPR
1.7. Contact phone number
Confidential because of GDPR
1.8. Contact fax number
Confidential because of GDPR
2.1. Metadata last certified
17 December 2024
2.2. Metadata last posted
17 December 2024
2.3. Metadata last update
17 December 2024
3.1. Data description
The data set transmitted to Eurostat include all the hypercubes requested according to EU Regulation (EC) 763/2008 and the following implementing regulations: Regulation (EU) 2017/543; Regulation (EU) 2017/712 and Regulation (EU) 2017/881.
All the definitions and classifications are compliant with the abovementioned Regulations. The few exceptions are detailed under the related topics.
3.1.1. Impact of the COVID-19 pandemic on census methodology
To replace the decennial census, in 2018 Istat launched the Permanent Population and Housing Census (PPHC): ‘combined’ approach which integrates administrative data and sample surveys (with yearly data collection and dissemination of census-type data). The metadata provided here are referred to the 2021 wave (end of the first cycle of the PPHC).
At the core of the PPHC is the statistical Population Register (PR). Together with the Statistical Base Register of Addresses (RSBL) and with the thematic registers on education and employment, PR provides the basis for the production of population census data in a combined census design, with two ad hoc sample surveys (Area survey and List survey) conducted annually to collect data for not replaceable (or only partially replaceable) variables and to collect data useful for the population count.
More precisely, concerning the population count, in the PPHC original design the capture-recapture model was adopted for direct estimates of the coverage errors of the PR, with the population register representing the ‘first capture’ and field data being the ‘second capture’. The population count was then obtained by applying correction coefficients for under-coverage and over-coverage errors to individuals in PR. In 2020, due to the COVID-19 pandemic the census supporting surveys had to be cancelled, therefore a fully register-based count was produced for the first time. Thanks to the availability of relevant information originated from administrative sources, the Signs of Life (SoL) method was applied to AIDA (SoL Archive) in order to produce the municipal population counts. SoL profiles (i.e. individuals who are supposed to have a similar over/under-coverage behaviour) were defined based on experts’ knowledge. In 2021 this change was further consolidated and improved, thanks to the availability of both survey data and SoL, combining evidence resulting from a statistical model with expert knowledge.
For more information on the SoL method see section 18.6.
In this revised census design, survey data are used to measure the error of the fully register-based count, instead of being used for correcting the Population Register.
3.2. Classification system
The following classification systems were used for the Italian PHC in 2021:
- Nomenclature of Territorial Units for Statistics (NUTS 2021)
- International Standard Classification of Education (ISCED-2011)
- International standard classification of occupations (ISCO-08)
- Statistical Classification of Economic Activities in the European Community (NACE Rev 2)
3.3. Coverage - sector
Not applicable.
3.4. Statistical concepts and definitions
The information is given separately for each census topic.
3.4.1. Statistical concepts and definitions - Usual residence
‘Usual residence’ is defined according to EU Regulation definition. In order to implement such definition, the signs of life method is applied to an extended population register (obtained by linking to the population register many other sources relevant for the determination of usual residence). For each individual present in the extended population register, the presence of Sol is evaluated over a period of 24 months before the census reference date (one of the main criteria is that there have to be signs of work/study for at least 12 months over 24 months). No methods are used to estimate the intention of staying.
3.4.2. Statistical concepts and definitions - Sex
‘Sex’ refers to the sex at birth, as registered in the Population Register maintained by Istat (Registro di Base degli Individui).
3.4.3. Statistical concepts and definitions - Age
The age reached in completed years at the reference date.
3.4.4. Statistical concepts and definitions - Marital status
Marital status is defined according to the individual’s de jure status i.e. as the legal conjugal status of an individual in relation to the Italian marriage laws. ‘Legally separated’ married partners or partners in registered partnership are classified under ‘Married or in registered partnership’.
3.4.5. Statistical concepts and definitions - Family status
The family nucleus is defined in a narrow sense; that is as two or more persons who belong to the same household and who are related as husband and wife, as partners in a registered partnership, as partners in a consensual union, or as parent and child. Thus, a family comprises a couple without children or a couple with one or more children, or a lone parent with one or more children. This family concept limits relationships between children and adults to direct (first-degree) relationships, that is, between parents and children.
3.4.6. Statistical concepts and definitions - Household status
Private households are identified according to the ‘housekeeping concept’.
According to the housekeeping concept, a private household is either: (a) a one-person household, that is a person who lives alone in a separate housing unit or who occupies, as a lodger, a separate room (or rooms) of a housing unit but does not join with any of the other occupants of the housing unit to form part of a multiperson household as defined below; or (b) a multiperson household, that is a group of two or more persons who combine to occupy the whole or part of a housing unit and to provide themselves with food and possibly other essentials for living. Members of the group may pool their incomes to a greater or lesser extent.
With regard to “children under 15 years of age living alone”, since the data show that there are numerous young persons having HST living alone, we add the following explanation: “the data is correct and refers to specific cases of minors under the care of a guardian, such as unaccompanied minors or minors who, despite having parents, live in an institution and are recorded in the population register as a household”.
3.4.7. Statistical concepts and definitions - Current activity status
Information on the ‘Current activity status’ is the current relationship of a person to economic activity, with reference to the last seven days prior to the census reference date. For persons classified as in search of work, the reference period is the four weeks before census reference date.
3.4.8. Statistical concepts and definitions - Occupation
Occupation refers to the type of work done in a job. ‘Type of work’ is described by the main tasks and duties of the work.
Persons are classified according to the occupation they have in their main job defined as the one where they worked the longest hours or, in case of the same amount hours worked in two or more jobs, the one from which the highest income is derived. The categories included in the breakdown 'occupation' correspond to the major groups of the ISCO-08 (COM) classification.
Persons under the age of 15 years, as well as persons aged 15 or over that were:
- not economically active during the reference week,
- unemployed in the reference week or
- unemployed, never worked before (i.e. they have never been employed in their lives)
are classified under 'not applicable'.
The allocation of a person within the breakdowns of the topics 'Occupation', 'Industry' and 'Status in employment' is based on the same job (see above).
3.4.9. Statistical concepts and definitions - Industry
Industry (branch of economic activity) refers to the kind of production or activity of the establishment or similar unit in which the job of an employed person is located.
Persons doing more than one job have been allocated an industry (branch of economic activity) based on their main job, i.e. the one where they worked the longest hours or, in case of the same amount hours worked in two or more jobs, the one from which the highest income is derived.
The breakdown by industry is available for persons aged 15 or over that were:
- employed during the reference week.
The categories included in the breakdown 'industry' list the 21 sections of the NACE Rev. 2 classification and appropriate aggregates
3.4.10. Statistical concepts and definitions - Status in employment
An ‘employee’ is a person who works in a ‘paid employment’ job, that is a job where the explicit or implicit contract of employment gives the incumbent a basic remuneration, which is independent of the revenue of the unit for which he/ she works. An ‘employer’ is a person who, working on his or her own account or with a small number of partners, holds a ‘self- employment’ job and, in this capacity, on a continuous basis (including the reference week) has engaged one or more persons to work for him/her as ‘employees’.
3.4.11. Statistical concepts and definitions - Place of work
The location of the place of work is the geographical area in which a currently employed person does his/her job.
The place of work of those mostly working at home is the same as their usual residence. The term ‘working’ refers to work done as an ‘employed person’ as defined under the topic ‘Current activity status’. ‘Mostly’ working at home means that the person spends all or most of the time working at home, and less, or no, time in a place of work other than at home.
Information on the place of work (i.e. the municipality or foreign state where the person works) is collected for persons who have a fixed place of work out of home, including persons who report to a fixed address at the beginning of their work period (such as bus drivers, airline crew, operators of street market stalls that are not removed at the end of the workday) and for persons who work at home.
Breakdown ‘No fixed place of work (inside or outside the Member State)’ includes persons without a fixed place of work, for example agricultural labourers changing frequently place of work or sales agents (does not include persons who report to a fixed address at the beginning of their work period, e.g. bus drivers, airline pilots and mobile delivery service providers).
3.4.12. Statistical concepts and definitions - Educational attainment
Educational attainment refers to the highest level successfully completed in the educational system of the country where the education was received. All education which is relevant to the completion of a level is taken into account even if this was provided outside schools and universities.
3.4.13. Statistical concepts and definitions - Size of the locality
A locality is defined as a distinct population cluster that is an area defined by population living in neighbouring or contiguous buildings.
Such buildings may be either:
(a) form a continuous built-up area with a clearly recognizable street formation; or
(b) though not part of such a built-up area, comprise a group of buildings to which a locally recognized place name is uniquely attached; or
(c) though not meeting either of the above two criteria, constitute a group of buildings, none of which is separated from its nearest neighbour by more than 200 meters.
3.4.14. Statistical concepts and definitions - Place of birth
Information on the ‘Place of birth’ is collected according to the place in which the birth took place, on the basis of international boundaries existing on 31 December 2021.
3.4.15. Statistical concepts and definitions - Country of citizenship
A person with two or more citizenships is allocated to only one country of citizenship, according to the following order of precedence:
- citizenship of the reporting country (Italian citizenship)
- if the person does not have the Italian citizenship: other EU Member State
- if the person does not have the citizenship of another EU Member State: other country outside the European Union.
3.4.16. Statistical concepts and definitions - Year of arrival in the country
The year of arrival is the calendar year in which a person most recently established usual residence in the country.
The data for 2021 refer to the time span between 1 January 2021 and the reference date.
3.4.17. Statistical concepts and definitions - Residence one year before
The relationship between the current place of usual residence and the place of usual residence one year prior to the census.
For all persons that have changed their usual residence more than once within the year prior to the reference date, the previous place of usual residence is the last usual residence from which they moved to their current place of usual residence.
3.4.18. Statistical concepts and definitions - Housing arrangements
Housing arrangements are identified in line with the Regulation definition.
The homeless (persons who are not usual residents in any living quarter category) are persons living in the streets without a shelter that would fall within the scope of living quarters (primary homelessness) and persons who do not have a fixed place of usual residence (nomads, vagrants, etc.).
3.4.19. Statistical concepts and definitions - Type of family nucleus
The definition of family nucleus is in line with the Regulation definition. A child who alternates between two households (for instance if his or her parents are divorced) shall consider the one where he or she spends the majority of the time as his or her household. Where an equal amount of time is spent with both parents the household shall be the where the child has his or her legal or registered residence.
The term 'Couples' includes married couples, couples in registered partnerships, and couples who live in a consensual union.
'Skip-generation households' (households consisting of a grandparent or grandparents and one or more grandchildren, but no parent of those grandchildren) are not included in the definition of a family.
3.4.20. Statistical concepts and definitions - Size of family nucleus
The family nucleus is defined as two or more persons who belong to the same household and who are related as husband and wife, as partners in a registered partnership, as partners in a consensual union, or as parent and child. Thus, a family comprises a couple without children or a couple with one or more children, or a lone parent with one or more children. This family concept limits relationships between children and adults to direct (first-degree) relationships, that is, between parents and children.
3.4.21. Statistical concepts and definitions - Type of private household
Private households are identified using the ‘housekeeping concept’. According to the housekeeping concept, a private household is either: (a) a one-person household, that is a person who lives alone in a separate housing unit or who occupies, as a lodger, a separate room (or rooms) of a housing unit but does not join with any of the other occupants of the housing unit to form part of a multiperson household as defined below; or (b) a multiperson household, that is a group of two or more persons who combine to occupy the whole or part of a housing unit and to provide themselves with food and possibly other essentials for living. Members of the group may pool their incomes to a greater or lesser extent.
3.4.22. Statistical concepts and definitions - Size of private household
Private households are identified using the ‘housekeeping concept’.
According to the housekeeping concept, a private household is either: (a) a one-person household, that is a person who lives alone in a separate housing unit or who occupies, as a lodger, a separate room (or rooms) of a housing unit but does not join with any of the other occupants of the housing unit to form part of a multiperson household as defined below; or (b) a multiperson household, that is a group of two or more persons who combine to occupy the whole or part of a housing unit and to provide themselves with food and possibly other essentials for living. Members of the group may pool their incomes to a greater or lesser extent.
'Primary homeless persons' are identified based on the Population Register (data on homeless registered in the Administrative Population Register are extracted from the Italian National Administrative Population Register and submitted to municipalities for validation). Data on homeless include persons who do not have a fixed place of usual residence (nomads, vagrants) as, according to Italian legislation, both categories are registered as homeless in the Population Register.
3.4.23. Statistical concepts and definitions - Tenure status of households
‘Tenure status of households’ refers to the arrangements under which a private household occupies all or part of a housing unit.
3.4.24. Statistical concepts and definitions - Type of living quarter
A living quarter is housing which is the usual residence of one or more persons.
'Conventional dwellings' are structurally separate and independent premises at fixed locations which are designed for permanent human habitation and are, at the reference date, either used as a residence, or vacant, or reserved for seasonal or secondary use.
'Separate' means surrounded by walls and covered by a roof or ceiling so that one or more persons can isolate themselves. 'Independent' means having direct access from a street or a staircase, passage, gallery or grounds.
'Other housing units' are huts, cabins, shacks, shanties, caravans, houseboats, barns, mills, caves or any other shelter used for human habitation at the time of the census, irrespective if it was designed for human habitation.
'Collective living quarters' are premises which are designed for habitation by large groups of individuals or several households and which are used as the usual residence of at least one person at the time of the census.
'Occupied conventional dwellings', 'other housing units' and 'collective living quarters' together represent ‘living quarters'. Any 'living quarter' must be the usual residence of at least one person.
3.4.25. Statistical concepts and definitions - Occupancy status
‘Occupied conventional dwellings’ are conventional dwellings which are the usual residence of one or more persons at the time of the census. ‘Unoccupied conventional dwellings’ are conventional dwellings which are not the usual residence of any person at the time of the census.
3.4.26. Statistical concepts and definitions - Type of ownership
‘Type of ownership’ refers to the ownership of the dwelling and not to that of the land on which the dwelling stands. It shows the tenure arrangements under which the dwelling is occupied.
'Owner-occupied dwellings' are those where at least one occupant of the dwelling owns parts or the whole of the dwelling. 'Cooperative ownership' refers to ownership within the framework of a housing cooperative.
'Rented dwellings' are those where at least one occupant pays a rent for the occupation of the dwelling, and where no occupant owns parts or the whole of the dwelling.
3.4.27. Statistical concepts and definitions - Number of occupants
The number of occupants of a housing unit is the number of people for whom the housing unit is the usual residence.
3.4.28. Statistical concepts and definitions - Useful floor space
Useful floor space is defined as the floor space measured inside the outer walls excluding non-habitable cellars and attics and, in multi-dwelling buildings, all common spaces; or the total floor space of rooms falling under the concept of 'room'.
3.4.29. Statistical concepts and definitions - Number of rooms
A ‘room’ is defined as a space in a housing unit enclosed by walls reaching from the floor to the ceiling or roof, of a size large enough to hold a bed for an adult (4 square meters at least) and at least 2 meters high over the major area of the ceiling.
3.4.30. Statistical concepts and definitions - Density standard (floor space)
The topic ‘Density standard (floor space)’ relates the useful floor space in square meters to the number of occupants, as specified under the topic ‘Number of occupants’.
3.4.31. Statistical concepts and definitions - Density standard (number of rooms)
The topic ‘Density standard (number of rooms)’ relates the number of rooms to the number of occupants, as specified under the topic ‘Number of occupants’.
3.4.32. Statistical concepts and definitions - Water supply system
Whether the conventional dwelling is equipped with piped water.
3.4.33. Statistical concepts and definitions - Toilet facilities
Whether the conventional dwelling is equipped with toilet facilities.
3.4.34. Statistical concepts and definitions - Bathing facilities
Whether the conventional dwelling is equipped with bathing facilities.
3.4.35. Statistical concepts and definitions - Type of heating
Conventional dwelling is considered as centrally heated if heating is provided either from a community heating centre or from an installation built in the building or in the conventional dwelling, established for heating purposes, without regard to the source of energy.
3.4.36. Statistical concepts and definitions - Type of building
The topic ‘Dwellings by type of building’ refers to the number of dwellings in the building in which the dwelling is placed.
3.4.37. Statistical concepts and definitions - Period of construction
The topic ‘Dwellings by period of construction’ refers to the year when the building in which the dwelling is placed was completed.
3.5. Statistical unit
The EU programme for the 2021 population and housing censuses includes data on persons, private households, family nuclei, conventional dwellings and living quarters.
3.6. Statistical population
The persons enumerated in the 2021 census are those who were usually resident on the Italian territory at the census reference date.
3.7. Reference area
Data are available at different levels of geographical detail in EU countries: national, NUTS2/NUTS3 regions and local administrative units (LAU), grids.
3.8. Coverage - Time
Data refer to the situation in the reporting country at the census reference date.
3.9. Base period
Not applicable.
Counts of statistical units should be expressed in numbers and where is needed rate per inhabitants enumerated in the country.
See the following sub-concepts.
5.1. EU census reference date
31 December 2021
5.2. National census reference date
31 December 2021
5.3. Differences between reference dates of national and EU census publications
No differences.
6.1. Institutional Mandate - legal acts and other agreements
The mission of the National Statistical Institute is to provide to the community the production and communication of timely and high quality statistical information, analysis and forecasts. This purpose must be carried out in full autonomy and on the basis of rigorous ethical-professional principles and the most advanced scientific standards (as required by the Comstat Directive n. 12/2021: Adoption of the Italian Code for the Quality of Official Statistics).
Since 1989 Istat has played a role of direction, coordination, technical assistance and training within the National Statistical System (Sistan). The system was established with d.lgs. 322/1989, as amended by Presidential Decree no. 166/2010, to rationalize the production and dissemination of information and optimize the resources allocated to official statistics. In the DPR n. 166/2010 just mentioned, the Institute is defined a public body with scientific, organizational, financial and accounting autonomy, which carries out its activities, according to the principles of scientific independence, impartiality, objectivity, reliability, quality and confidentiality of statistical information provided at European level and international. Furthermore, with the entry into force of Legislative Decree no. 218/2016, Istat was also officially listed among the public research bodies (EPR).
For the most part, the object of production is established by the European Statistical Program (Pse) and the National Statistical Program (Psn), respectively adopted by acts of the Council and the European Parliament and the President of the Italian Republic; the production methods are instead established by the European Statistics Code and the Italian Code of Official Statistics, with a supervision carried out by Eurostat and the Commission for the Guarantee of Statistical Information (Cogis), respectively.
National legal background for the population and housing censuses, as requested by Reg. 2017/881, Annex point 1.1.:
- Legislative Decree 18 October 2012, n. 179art. 3 – Further urgent measures for the growth of the country, converted with amendments by law 17 December 2012, n. 221, which first introduced the permanent census
- Decree of the President of the Council of Ministers of 12 May 2016 regarding the population census and national archive of house numbers and urban streets (ANNCSU), published in the Official Journal no. 167 of 19 July 2016, relating to implementation times
- Law 27 December 2017, n. 205 – State budget forecast for the financial year 2018 and multi-year budget for the three-year period 2018-2020 (art. 1, paragraphs 227-237)
- National statistical program for the three-year period 2020-2022 - 2022 update approved by Decree of the President of the Republic 11 July 2023 (IST-02493, Integrated System of Permanent Census and Social Surveys, area component and IST-02494 Integrated System of Permanent Census and Social Surveys, component from the list).
6.1.1. Bodies responsible
Istat
6.2. Institutional Mandate - data sharing
Not applicable.
7.1. Confidentiality - policy
Several national legal acts guarantee the confidentiality of data requested for statistical purposes. In Italy, according to art. 9, paragraph 1 of the Legislative Decree n. 322 of 1989 (concerning the statistical system), statistical data cannot be disseminated but in aggregated form, in order to make it impossible to identify the person to whom the information relates. The data collected can only be used for statistical purposes.
Official statistics must also safeguard the rights, basic freedoms, and dignity of respondents, in particular with regard to the right of confidentiality and personal identity.
Istat assures the protection of personal data according to the General Data Protection Regulation (Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, repealing Directive 95/46/EC) and, as national legislation, Italian Data Protection Code (Legislative Decree no. 196/2003) and Code of conduct and professional practice applying to the processing of personal data for statistical and scientific research purposes within the framework of the national statistical system.
In order to make statistical secrecy and protection of personal data effective, Istat is currently taking appropriate organizational, logistical, methodological and statistical measures in accordance with internationally established standards.
Moreover, Legislative Decree n. 322 of 1989, art. 6 and 6 bis provides that the exchange of microdata and personal data within the National Statistical System (Sistan) is possible if it is necessary to fulfil requirements provided by the National Statistical Programme.
Finally, in implementation of art. 5-ter of the legislative decree 14 March 2013, no. 33, the new "Guidelines for the access for scientific purposes to the elementary data of the National Statistical System" establish the conditions under which the bodies and offices of the National Statistical System can allow researchers to access their own elementary data for scientific purposes.
7.2. Confidentiality - data treatment
With regard to register data, all administrative sources received by Istat, before being used for any production process, are pre-processed to assign a SIM code to each record in the source in order to allow the pseudonymisation of the information and protect confidentiality (see section 18.5).
For the release of census results, Istat has not provided any procedure for the protection of "confidentiality" as national legislation (to date) exempts the census from this type of constraint. Therefore, data are released without aggregation of cells/modes or obscuration of values below a predefined threshold.
8.1. Release calendar
No calendar has been set for the release of the census results, neither for those to be delivered to Eurostat nor for those for Italian dissemination.
8.2. Release calendar access
On the Istat website, there is no reference to the publication calendar as it has not been defined (as reported in section 8.1).
8.3. Release policy - user access
In line with the Community legal framework and the European Statistics Code of Practice Eurostat disseminates European statistics on its website respecting professional independence and in an objective, professional and transparent manner in which all users are treated equitably.
On the occasion of each data release, Istat issues a press release explaining the contents of the specific release, the information level and the territorial level of the data (for more details see section 10.1). Usually the release contains the link to the Census web page on the Istat website where it is possible to consult and download the data (for the different types of output: tables; maps), the information note and the methodological note that usually accompany the release.
The results can be consulted through two different systems: the DW and the Data Browser (section 10.1 provides the relevant links).
With reference to European dissemination, it has always been said at numerous national and international events (conferences and working meetings) that Istat is committed to producing and releasing the data requested by Eurostat by 31 March 2024. As soon as Eurostat releases the data, the news will be published on the Censuses page of Istat's institutional website.
For national dissemination, the release date has been set around mid-December of the year following the census year for population count data at municipal level and for the main structural variables (sex and age).
For other census information, no release schedule has been defined. Users, especially the most loyal users, have very clear expectations of the results that will be released during the course of the year. Users are also accustomed to contacting the Istat Contact Centre for information on release times and the types of data that will be released.
The metadata provided here are referred mainly to the 2021 wave of the Permanent Population and Housing Census (end of the first cycle of the PPHC) but the frequency of dissemination is annual (the Permanent Census produces and releases yearly data).
See the following sub-concepts.
10.1. Dissemination format - News release
Census data are disseminated yearly on Istat website and released together with accompanying press releases. The main press release concerning the main demographic data (sex, age, citizenship) and the breakdown by education is planned at the end of every year.
The first press release linked to the 2021 census data was published one year after the reference date (15 December 2022). It also included data on the 'hard-to-count' populations (institutional population, homeless and people living in formal or informal camps). Attached (PR1) is the press release (in Italian) that accompanied the publication of the data.
A second press release was linked to the dissemination of enumeration area level data and of sub-municipal areas for the biggest municipalities (09 June 2023) . Attached (PR2) is the press release (in Italian) that accompanied the publication of the data. GIS interactive maps are also available for longitudinal data (1951-2019).
Afterwards, the 20 press releases at the NUTS 2 level have been following, starting from 18 September 2023 (see e.g. the press release for Campania at this website). Attached, as an example, is the press release (in Italian) that accompanied the publication of the data for Campania (PR3).
Istat also released the distribution of the population for the 2021 census on the regular grid, with cells of 1 square kilometre. The processing was carried out using the European grid released by Eurostat.
The dissemination of 2021 census data is still going on and will continue after the release of the data through the Census Hub. The graphic visualization of some of the NUTS 2 level data is also possible through the dashboard available at the following link. Attached (PR4) is the press release (in Italian) that accompanied the release of the dashboard for the graphical display of some of the NUTS 2 level data.
Annexes:
First Press Release
Second Press Release
Press Release at the NUTS 2 level
Press Release Dashboard
10.2. Dissemination format - Publications
No publications have been released so far (nor are currently planned).
10.3. Dissemination format - online database
There are two points of access to census data: the permanent censuses data warehouse and the ad hoc population census data browser.
The data at EA level are available at the following link and the data at the grid level at the following link.
The above mentioned information is listed on the page on census results (please refer to the page in Italian, the English one is not complete).
Regarding the number of accesses to online databases, only the number of pages viewed on the data browser of the population census can be given. The number of page views from the opening of the data browser (15 December 2020) to the end of February 2024 is approximately 856,300 pages.
10.4. Dissemination format - microdata access
No census microdata have been released sofar.
Only for a limited number of variables (sex, age, country of citizenhip, place of birth, degree of education, current activity status - occupied YES/NO, enumeration area, no of persons in private/institutional household, identifying personal/household codes), the municipalities and Prefectures (i.e. the institutions involved in the field enumeration) will have access to census micro-data, upon request, and each local authority only regarding its respective territory (i.e. the same municipality or for the Prefecture, all the municipalities in the province). These administrations are bound to use these data only for the statistical elaborations aimed at the pursuit of their respective institutional functions.
10.5. Dissemination format - other
Not applicable.
10.6. Documentation on methodology
The main press releases (discussed and attached in section 10.1) are accompanied by methodological documents.
With reference to the first press release related to the 2021 census data (Annex PR1 in section 10.1), the related methodological note (MN1) that accompanied the publication of the data is attached here.
With reference to the second press release on the release of the sub-municipal data of the 2021 Census (Annex PR2 in section 10.1), the information note (IN2) and methodological note (MN2) that accompanied the release of the data are attached.
With reference to the population data on the regular grid, with cells of 1 square kilometre, the methodological note (MN4) that accompanied the release of the data is attached.
Annexes:
Methodological Note
Informative Note
Methodological Note
Methodological Note
10.7. Quality management - documentation
Currently the main documentation on quality management and quality assessment is the information provided to Eurostat (Quality Report - ESS Metadata Handler). A Working Group is currently working to produce the national Quality Report,
11.1. Quality assurance
Since the 90s Istat has adopted a systematic approach to ensure the quality of statistical information and of its services to the community. With the aim of strengthening the commitment to quality, in 2020 Istat set up the Quality Committee, for overseeing all quality initiatives in the Statistical Institute. In addition the role of Quality Manager was formally established.
In 2021 a new quality policy for statistical production was adopted. It is consistent with the European quality framework developed by Eurostat, and transposes its main principles and definitions. The endorsement in 2005 of the European Statistics Code of Practice (last revised in 2017) established the principles to be applied in order to ensure and strengthen both the trust and the quality of the European Statistical System. The principles of the code are largely inspired by the Fundamental Principles of Official Statistics adopted by the General Assembly of the United Nations in 2014 and developed by the Conference of European Statisticians in 1991. Further information are available on the Istat website, quality section.
Quality assurance procedures applied in Census:
Control on interviewers: Training course for interviewers, Drafting an interviewer instruction manual, Training course for staff of data collecting bodies/institutions, Establishing an Help Desk for support to interviewers during field operations
Control on unit nonresponse: Survey presentation letter signed by Istat President, Guarantees on statistical confidentiality, Written description of survey objectives, Telephone contacts to make an appointment for the interview, Advertisement of the survey on media, Establishing a toll free line or telephone number for further explanations, Administrative fines for nonrespondents, Publication of the "Survey Information" for respondents on the ISTAT website, mail follows-up.
Data validation activities: Coherence control with previous data, Coherence control with data from other sources. For more details see Section 18.4 (Data validation).
11.2. Quality management - assessment
The census data production process incorporates the management of quality, as requested by Regulation 2017/881, Annex point 3.3.2.
More details concerning the census coverage assessment are available under 11.2.1.
Estimated under-coverage is about 0.25%
Estimated over-coverage is about 1.44%
Concerning the estimates of census variables, they have been estimated thourgh model based estimates, using both survey data and register data.
Editing and imputation has been performed on survey data through different methods. The methods used for groups of units or topics are presented below.
The following imputation methods were adopted for the tpoics on "Individuals":
- deterministic corrections to resolve systematic errors that emerged from the comparison of the surveyed data with administrative sources;
- deterministic corrections to resolve internal inconsistencies that emerged from the violation of compatibility rules;
- probabilistic imputations from marginal distributions to resolve residual random errors and partial non-response;
- imputation from minimum distance donor to resolve residual random errors and partial non-response.
The following imputation procedures were adopted for the topics "Households":
- deterministic corrections to resolve systematic errors that emerged from the data analysis
- Households procedure (Istat internal procedure)
- Deterministic corrections or from administrative sources to resolve systematic errors that emerged from the violation of compatibility rules
- Probabilistic imputations from marginal distributions to resolve residual random errors and partial non-response;
- Manual corrections.
For the topics on "Housing", imputations were carried out using a probabilistic method respecting the distribution of values in the subset of exact records, and for each variable, the strata within which the imputation should be carried out were identified.
Response rates are available on request, but no census data are produced based only on survey data (all the variables are either derived from registers either derived from model-based estimates).
For the data validation phase see section 18.4.
11.2.1. Coverage assessment
The population count is fully register-based. The statistical PR is corrected for both over-coverage and under-coverage through the Signs of life methodology (more details are available under 18.6).
On the other hand, an independent coverage assessment of the register-based count has to be performed. This is currenlty done by using survey data (i.e. by comparing register estimates and survey estimates), but the survey currently used not been designed for these purposes.
More precisely, both the List survey for the overcoverage and the Area survey for the undercoverage are being used. The Area survey conducted on a sample of addresses drawn from the Register of addresses, canvassed “blindly” (i.e. by door to door field enumeration) to enumerate every individual with CAPI technique. It was conducted in October 2021 on a sample of 3560 municipalities (the total number of municipalities in Italy is of 7904) for a total number of 776,097 enumerated households. The List survey was conducted in October 2021 on a sample of 4531 municipalities for a total number of 2,472,400 enumerated households.
A more sound Audit survey approach will be implemented starting from 2025.
11.2.2. Post-enumeration survey(s)
We perform a fully register-based count and we use field surveys to estimate the error of the count i.e. to estimate the Mean Square Error. We don't conduct a PES.
12.1. Relevance - User Needs
With reference to the classification of users who made requests on the PHC data, 619 requests for data and services were received by the Istat Contact Centre over the last two years, which concerned:
- Assistance in searching for statistical data: 492 cases
- Request for customised processing: 41 cases
- Request for information and methodological notes: 36 cases
- Media support: 50 cases
Among the types of users of the Contact Centre, the following groups were found (% figure):
- School, university and research (teachers, researchers, data scientists): 36.4%
- Public administration (excluding school and university): 14.1%
- Education (students): 9.5%
- Enterprises and freelancers: 28.9%
- Third sector (private non-profit organisations): 3.3%
- International institutes and organisations: 1.0%
- Media and press offices: 1.0%
- Political parties, trade unions: 0.3%
- Private citizens/pensioners/no organisation: 5.6%
Since we did not conduct a specific user survey, we have no information concerning either the use of the data and information requested or an assessment of whether the results were achieved. Furthermore, we have no information on the cases of non-fulfilment of the data, either with regard to information detail or data quality. Nor do we have any information on the future needs of users of PHC data.
The only information available concerns the results of a user satisfaction survey conducted by Istat in 2021 on a group of 'specialised users' who have in the past requested data and customised processing related to the 2011 PHC at the census section level. [Carbonetti, G., A. Ciccarese, and R. Roncati (2023). An analysis of the demand for sub-municipal data from the Population and Housing Census. Review of official statistics. N. 1-2-3/2023. Rome, Italy: Istat]. The results are presented in detail in section 12.2.
12.2. Relevance - User Satisfaction
With regard to user satisfaction and opinions, since 2013 Istat has been conducting an annual survey on user satisfaction with the products and services offered through its institutional website. The surveys conducted in 2022 and 2023 show that there is a high level of satisfaction with the products offered on the ISTAT website:
- Year 2023 - 85.8% of respondents said they were "very" or "moderately satisfied", 14.2% were "slightly" or "not at all satisfied";
- Year 2022 - 84.6% of the respondents stated that they were "very" or "moderately satisfied", 15.4% were "slightly" or "not at all satisfied".
In this section, we propose the results of a specific user satisfaction survey conducted by Istat in 2021 on a group of 'specialised users' who in the past have requested data and customised processing related to the 2011 PHC at the census section level. [Carbonetti, G., A. Ciccarese, and R. Roncati (2023). An analysis of the demand for sub-municipal data from the Population and Housing Census. Review of official statistics. N. 1-2-3/2023. Rome, Italy: Istat].
The survey was conducted using a CAWI technique in a comprehensive manner, on a specific target group of users with the aim of assessing the match between user demand and the supply of reliable and timely statistics and evaluating expectations towards future releases of census data. Questions were also asked about the user support service offered by Istat's Contact Centre which, in the extensive and articulated data dissemination system, represents the main channel for requesting customised processing.
The results of the survey provide a consolidated positive opinion among the most specialised users, many of whom have become loyal over time.
Among the main results, it should be noted that over 90% of respondents were satisfied with the level of information detail of the data received (28.1% of users said they were fully satisfied, especially those who used the data for commercial purposes, while 62.5% would have liked more information detail, especially those who requested the data for analysis or research purposes). On the other hand, only 9.4% of users were dissatisfied due to the inadequacy of the spatial level or the inability to integrate the data with the information received.
With regard to the quality of the data received, more than 50% of the respondents were fully satisfied with the level of quality of the data received, 41.6% were only partially satisfied and 6.3% were negatively satisfied. Cross-referencing the answers with the reason for the request, the users who used the data for analysis or research purposes were those who would have needed data with a higher level of quality. This indicates a strong focus and expectation of academic and research users towards high quality data, precisely because of the specificity of their study and analysis objectives.
Finally, about 85 per cent of the respondents appreciated the service of the Istat Contact Centre at all stages of the management of the customised processing request. The critical points highlighted were excessive bureaucracy, long waiting times and the high cost of the data (in the case of paid data).
In general, to improve the service, users suggested greater accessibility to the micro-data, the possibility of interacting with the ISTAT technical staff who produced the data, and lower costs for students, PhD students and young people without stable employment.
12.3. Completeness
All the data foreseen by Reg. 763/2008 have been/are being produced. The main socio-demographic data have been made available to national users through several access points (see 10.1 and 10.3). Additional topics will be relased after the release to Eurostat.
13.1. Accuracy - overall
There are no particular reasons for census data unreliability.
With reference to the usual residence, the signs of life methodology has been used to estimate the population register over and under-coverage, while a measurement of the error (mean squared error) of the register-based count has been conducetd through the field surveys.
For more details on the methodology used to estimate the data on the different topics, see the following annexes:
- ANNEX attached in section 18.5;
- ANNEX attached in section 19 (Sheet 2. "Data Sources").
13.1.1. Overall accuracy - Usual residence
The census population count is fully register-based. The basis infrastructure is the Statistical Population Register, whose main administrative data sources are the population registers of Italian municipalities (merged in a centralized administrative population register: ANPR).
This register provides the basis of the fully register-based population count but in order to obtain the official census count is ‘corrected’ by using the Signs of Life methodology (see section 18.6). More precisely, classification rules (deterministic criteria) are applied to data relating to two consecutive years (the two years preceding the census reference date i.e. the period from the 1st of January 2020 to the 31st of December 2021 for the 2021 wave of the Permanent Population Census). This choice has been done on the basis of the usual residence definition of the EU Regulation (EC) No 763/2008, i.e. of the 12-months criterion. Based on this definition, the production process takes into consideration SoL of at least 12 months observed over the 24 months period preceding the reference date.
As a result of this process, the statistical PR is corrected for both over-coverage and under-coverage i.e.:
- individuals with SoL but not resident according to ANPR, are added to the population count (under-covered individuals);
- individuals resident according to ANPR but without SoL in other administrative sources (over-covered individuals).
Furthermore, in order to measure the error of the fully register-based count, an Areal survey is used to calculate the Mean Square Error.
Third level students whose term-time address is not the one of their family home have been considered to have their usual residence at their family home if they meet the criteria for usual residence in Italy (i.e., as any other group, they are confirmed at the address where they are registered according to ANPR if the SoL method confirms their usual residence in Italy).
13.1.2. Overall accuracy - Sex
There are no particular reasons for data unreliability for this topic.
13.1.3. Overall accuracy - Age
There are no particular reasons for data unreliability for this topic.
Data on the age of individuals are derived from the date of birth information contained in the Population Register and the ANPR (National Register of Resident Population). Due to the completeness and reliability of this information, the accuracy of age specific population data at LAU and grid level is highest.
13.1.4. Overall accuracy - Marital status
There are no particular reasons for data unreliability for this topic.
Marital status data is individual and not estimated. It is determined from the Population Register supplemented with information from ANPR (National Register of the Resident Population): Marriage_date or Civil_Union_date.
Couples are determined within the Editing and Imputation (E&I) process through the 'Family Procedure' (see section 18.5).
The minimum age for Italians is 16 (up to 15 is accepted for couples married before the 1960s); for some foreign nationalities it is 14 years.
13.1.5. Overall accuracy - Family status
There are no particular reasons for data unreliability for this topic.
13.1.6. Overall accuracy - Household status
There are no particular reasons for data unreliability for this topic.
13.1.7. Overall accuracy - Current activity status
There are no particular reasons for data unreliability for this topic.
13.1.8. Overall accuracy - Occupation
There are no particular reasons for data unreliability for this topic.
13.1.9. Overall accuracy - Industry
There are no particular reasons for data unreliability for this topic.
13.1.10. Overall accuracy - Status in employment
There are no particular reasons for data unreliability for this topic.
13.1.11. Overall accuracy - Place of work
There are no particular reasons for data unreliability for this topic.
13.1.12. Overall accuracy - Educational attainment
There are no particular reasons for data unreliability for this topic.
13.1.13. Overall accuracy - Size of the locality
There are no particular reasons for data unreliability for this topic.
13.1.14. Overall accuracy - Place of birth
There are no particular reasons for data unreliability for this topic.
13.1.15. Overall accuracy - Country of citizenship
There are no particular reasons for data unreliability for this topic.
13.1.16. Overall accuracy - Year of arrival in the country
There are no particular reasons for data unreliability for this topic.
13.1.17. Overall accuracy - Residence one year before
There are no particular reasons for data unreliability for this topic.
13.1.18. Overall accuracy - Housing arrangements
There are no particular reasons for data unreliability for this topic.
13.1.19. Overall accuracy - Type of family nucleus
There are no particular reasons for data unreliability for this topic.
13.1.20. Overall accuracy - Size of family nucleus
There are no particular reasons for data unreliability for this topic.
13.1.21. Overall accuracy - Type of private household
There are no particular reasons for data unreliability for this topic.
13.1.22. Overall accuracy - Size of private household
There are no particular reasons for data unreliability for this topic.
13.1.23. Overall accuracy - Tenure status of households
There are no particular reasons for data unreliability for this topic.
13.1.24. Overall accuracy - Type of living quarter
There are no particular reasons for data unreliability for this topic, but some accuracy issues are present with reference to the homeless population.
The estimate of homeless population has been made based on homeless registered population. More precisely, the population registers (which are the main source for the register-based count) include data on the number and main characteristics of people registered at fictitious or other addresses used for registration of homeless and other persons without a usual residence. This figure is known to be affected both by undercoverage [many homeless are not registered – foreigners without a permit of stay – or are not registered as such i.e. they are registered at their previous usual residence address] and overcoverage (fictitious addresses are used by municipalities for registering other vulnerable categorie, plus some people who are not homeless might try to be registered at fictitious addresses for fiscal reasons).
For these reasons, in 2025 Istat will conduct two ad hoc field surveys to estimate the actual number of homeless and collect information on their main socio-demographic characteristics and on their 'life history'.
13.1.25. Overall accuracy - Occupancy status
There are no particular reasons for data unreliability for this topic.
13.1.26. Overall accuracy - Type of ownership
There are no particular reasons for data unreliability for this topic.
13.1.27. Overall accuracy - Number of occupants
There are no particular reasons for data unreliability for this topic.
13.1.28. Overall accuracy - Useful floor space
There are no particular reasons for data unreliability for this topic.
13.1.29. Overall accuracy - Number of rooms
Not computed
13.1.30. Overall accuracy - Density standard (floor space)
There are no particular reasons for data unreliability for this topic.
13.1.31. Overall accuracy - Density standard (number of rooms)
Not computed
13.1.32. Overall accuracy - Water supply system
There are no particular reasons for data unreliability for this topic.
13.1.33. Overall accuracy - Toilet facilities
There are no particular reasons for data unreliability for this topic.
13.1.34. Overall accuracy - Bathing facilities
There are no particular reasons for data unreliability for this topic.
13.1.35. Impact of the COVID pandemic on data accuracy
None
13.2. Sampling error
This section is not applicable for the 2021 Italian Census strategy. However, we try to provide some information where possible.
For the 2021 PHC in Italy, two sample surveys (A-survey and L-survey) were carried out for different purposes, covering 6.5% of the population .
The data collected through the sample surveys were used jointly with information available in registers, through the adoption of statistical models (no estimates are produced based only on survey data).
Some topics were determined from register-based data, while for the remaining topics survey data were used to train the estimation models.
Concerning the sampling error, without an indication of the "percentage value" to be estimated and the "geographical domain" of reference, it is not possible to provide any kind of evaluation.
13.3. Non-sampling error
For the 2021 PHC in Italy, two sample surveys (A-survey and L-survey) were carried out for different purposes.
The data collected through the sample surveys were not used directly to estimate the census variables. Some topics were determined from register-based data, while for the remaing topics survey data were used to train the estimation models.
Bias derived from model assumptions has been evaluated comparing model based estimates and direct estimes computed from the census surveys supporting the estimation process.
Area sampling survey (A-survey) has been carried out in order to adjust for bias deriving from possible under-coverage of sampling lists.
Non-sampling errors were not computed. As mentioned above, the estimation process simply did not use the sample data, so both the sample and non-sample error measures for the survey are not relevant for our process.
14.1. Timeliness
The time lag between census reference date and the first release of data for the basic socio-demographic characteristics is of one year and currently cannot be reduced, taking into account the time necessary for processing both survey and administrative data.
Data on the population by sex, age, citizenship, and education level by municipality were released about one year after the census reference date (15 December 2022). Data on current activity status and on households (no. of households by no. of components) were released in February 2023. The same data at enumeration area level were released in June 2023.
The 15 December 2022 release included data on the hard-to-reach populations (institutional population, homeless and people living in formal/informal camps). Data on conventional dwellings (no. by occupancy status) was released in March 2023. In December 2023 were released data on the population by migratory background. The remaining data will be released in the course of 2024, after trasmission of census hypercubes to Eurostat (31 March 2024).
No preliminary data were released.
14.2. Punctuality
Concerning Italian dissemination, data have been released according to the target date i.e. one year after the census reference date (for the main socio-demographic characteristics). Concerning data transmission to Eurostat, no delays are foreseen with reference to the 31 March 2024 deadline.
15.1. Comparability - geographical
The data are comparable across all the different geographical areas.
With regard to grid data, the definitions, concepts and classifications adopted are all in line with the provisions of Implementing Regulation 2018/1799.
The data are therefore fully comparable at European level.
The A and L sample surveys were conducted with the same methodology throughout the territory. The data were produced with the same methodology and using nationally available sources.
The data are therefore also fully comparable between Italian regions.
15.1.1. Geographic information - data quality
No problems with the quality of the geographical information.
The Italian population was placed in the grid cells using mainly the coordinates of the population's residence addresses.
The source of the coordinates of the addresses of residence can be found at this website.
No data on positional accuracy in metres are available.
The address coordinates used are either field-detected (82%) or interpolated (18%) coordinates.
The geographical information all refers to the same date and covers the whole country.
15.2. Comparability - over time
Not applicable.
15.3. Coherence - cross domain
No problem of incoherence with statistics from other official sources.
All the information used (both statistical and geographical) refers to the same date (the census date).
There are no problems of consistency or comparability between the different spatial domains.
15.4. Coherence - internal
The data are coherent by construction.
Coherence is ensured at each geographical level.
For hypercubes that have one or more variables in common, coherence of the marginal distributions in common is guaranteed.
Any differences in values are negligible and depend only on the rounding process of the estimated values.
The estimated total cost of the first cycle of the Permanent Population and Housing census is of 162,750,000 euro (which represents the funds transferred to Istat by the State ad hoc for the first cycle of the Permanent Population and Housing Census: 2018-2021). These do not include the costs of preparing and processing administrative data, of the project management and administrative support, and all the activities done inhouse (i.e. the salaries of all the staff working on the census), including the Pilot Survey held in 2017.
It is worth mentioning that the Permanent Census is a combined census that integrates administrative data with ad hoc field survey data. These last are collected yearly on a sample of municipalities and households. More precisely, in the first cycle (2018-2021) two ad hoc surveys were conducted annually in self-representative municipalities (i.e. those with a population over 17,800 inhabitants and smaller ones which do not rotate in the sampling scheme of the Labour Force Survey) and every four years, according to a rotation scheme, in non-self-representative municipalities (i.e. all the others). Every year are involved in the surveys about 2850 municipalities for a total of 1,500,000 households. In 2020 the fields surveys were canceled due to the pandemic therefore in 2021 the number of municipalities involved was higher (4531) as the number of non-representative ones was double.
The burden on respondents and on municipalities (which are responsible for conducting the field work) is reduced (and diluted over 4 years) thanks to the adoption of a sampling strategy and to the integration with administrative data. Furthermore, a multi-mode data collection technique is used, totally paperless with the CAWI mode offered as first option, allows respondents large flexibility. In addition to the CAWI option, respondents can use the municipal collection centre for assistance in filling out the electronic questionnaire.
The breakdown by major cost components in percentage is the following: 2% for the second pilot Survey; 11% for printing enumeration materials (no paper questionnaires); 3% for communicatoon and publicity; 84% for field operations (enumerators, training, field hardware and software).
17.1. Data revision - policy
Census data are not subject to revision since only final data are released.
17.2. Data revision - practice
There are no revisions for Census data.
18.1. Source data
At the core of the Italian Permanent Population and Housing Census (PPHC) is the Population Register (PR). Together with the Statistical Base Register of Addresses (RSBL) and with the thematic registers on education and employment, PR provides the basis for the production of population census data in a combined census design (census data are produced by using multiple sources).
Sampling data
Two ad hoc sample surveys (Area survey and List survey) are conducted annually for the quality measurement of the fully register-based population count estimation and to collect data for not replaceable (or only partially replaceable) variables.
Therefore, concerning the census outputs, the PPHC produces a:
- fully register-based population count;
- census hypercubes estimated by the joint use of information already available in registers and of data collected on the field, through the use of statistical models.
In the first cycle (2018-2021) of the PPHC, two ad hoc surveys are conducted annually in self-representative municipalities (i.e. those with a population over 17,800 inhabitants and smaller ones which do not rotate in the sampling scheme of the Labour Force Survey) and every four years, according to a rotation scheme, in non-self-representative municipalities (i.e. all the others). In each municipality, a sample of households is selected from the Population Register for the List survey and a sample of addresses from the Address Register for the Area survey.
Every year are involved in the surveys about 2850 municipalities for a total of 1,400,000 households (of which 950,000 for the List survey and 450,000 for the Area survey). In 2021 the number of municipalities involved was higher (4531 out of the total 7903 municipalities in Italy) as the number of non-representative municipalities was double than the number originally planned (in 2020 the fields surveys were canceled due to the pandemic therefore the municipalities due to participate in 2020 were 'moved' to 2021). Therefore the households involved in the 2021 surveys were respectively 2,472,400 for the L survey and 776,097 for the A survey. The reference population is the population usually resident in Italy.
Administrative data
With reference to the administrative data sources used to produce the census data - either through an integration process or an estimation model - it is suggested to see:
- ANNEX attached in section 18.1.3 (List of data sources per topic);
- ANNEX attached in section 19 (Sheet 3. "Administrative data sources").
18.1.1. List of data sources
Persons: A survey + L survey + several admin sources (see specification by topic in section 18.1.3)
Households: Population Register
Family nuclei: Population Register
Living quarters: A survey + L survey + Buildings Register
Conventional dwellings: A survey + L survey + Buildings Register
For A and L survey see details in section 18.1
For the sources used for estimates referring to the different groups of units see ANNEX (CENSUS_21NESMS_A_IT_2021_0000_an_2.xls)
For which source was used for each variable see section 18.1.3
Annexes:
List of sources associated with groups of units
18.1.1.1. List of data sources - Data on persons
For the estimation of data on persons, the data sources used (for the different census topics classifying persons) are as follows:
- Municipal population register (Anagrafe comunale della popolazione residente);
- ANPR (National Register of the Resident Population);
- Population Register (Istat);
- Register of Places (Istat);
- Building Cadaster;
- ANCSSU (National Archive of Urban Street Numbers);
- Tax register;
- Social security register;
- Administrative Register of Public employees;
- Administrative Register of Employees in public schools and universities;
- Administrative Register of Temporary workers;
- Administrative Register of primary and secondary school students;
- Administrative Register of university students;
- Register of house lease contracts;
- Register of Earnings;
- Register of vital events;
- Register of acquisition of Italian Citizenship;
- Registers of Italians resident abroad;
- Permits of stay;
- Register of circulating vehicles;
- List survey (Istat);
- Area survey (Istat);
- 2011 PHC (Istat);
- Registrations and cancellations at the registry office due to change of residence.
18.1.1.2. List of data sources - Data on households
For the estimation of data on households, the only data source used (for the different census topics classifying households) is the following:
- Population Register (Istat).
18.1.1.3. List of data sources - Data on family nuclei
For the estimation of data on family nuclei, the data sources used (for the different census topics classifying family nuclei) are as follows:
- Municipal population register (Anagrafe comunale della popolazione residente);
- ANPR (National Register of the Resident Population);
- Population Register (Istat);
- Building Cadaster;
- ANCSSU (National Archive of Urban Street Numbers);
- Register of house lease contracts;
- List survey (Istat);
- Area survey (Istat);
- 2011 PHC (Istat).
18.1.1.4. List of data sources - Data on living quarters
For the estimation of data on living quarters, the only data source used (for the different census topics classifying living quarters) is the following:
- Register of Places (Istat).
18.1.1.5. List of data sources - Data on conventional dwellings
For the estimation of data on dwellings, the data sources used (for the different census topics classifying dwellings) are as follows:
- Municipal population register (Anagrafe comunale della popolazione residente);
- ANPR (National Register of the Resident Population);
- Population Register (Istat);
- Register of Places (Istat);
- Building Cadaster;
- ANCSSU (National Archive of Urban Street Numbers);
- Register of house lease contracts;
- List survey (Istat);
- Area survey (Istat);
- 2011 PHC (Istat).
18.1.2. Classification of data sources
Classification of the data sources as requested by Reg. 2017/881, Annex point 2.1.
18.1.2.1. Classification of data sources - Data on persons
04.Combination of register-based censuses and sample surveys18.1.2.2. Classification of data sources - Data on households
04.Combination of register-based censuses and sample surveys18.1.2.3. Classification of data sources - Data on family nuclei
04.Combination of register-based censuses and sample surveys18.1.2.4. Classification of data sources - Data on living quarters
04.Combination of register-based censuses and sample surveys18.1.2.5. Classification of data sources - Data on conventional dwellings
04.Combination of register-based censuses and sample surveys18.1.3. List of data sources per topic
See table in annex (multiple sources used, according to the topic).
Only the population count is fully register-based and the main variables for the persons (sex, age, marital status, place of birth and citizenship) are currently derived totally from the Population Register.
Annexes:
List of sources associated with topics
18.1.4. Adequacy of data sources
The Italian Permanent Census strategy guarantess the compliance with the census essential features (Art. 4(4) of Reg. 763/2008), as requested by Reg. 2017/881, Annex point 2.4
Separate information is provided separately for each essential feature.
18.1.4.1. Adequacy of data sources - Individual enumeration
Each individual included in the population count is uniquely identified in the population register and/or in another administrative source while the individual variables for each individual included in the population count are either derived from the population register either estimated through model based estimates (a probability is assigned or the exact value for each individual included in the count).
The characteristics of each statistical unit are recorded separately, so that each characteristic can be cross-classified with others (limitedly to the cross-tabulations and the planned estimations domains). Additional cross-tabulations require ad hoc estimations models.
18.1.4.2. Adequacy of data sources - Simultaneity
All information refers to the same point in time (reference date) thanks to the registers and to model based indirect estimates. Even if the smaller municipalities participate to the surveys once every four year, each year indirect estimates are calculated for all the municipalities (including those not participating to the surveys in that year).
18.1.4.3. Adequacy of data sources - Universality within the defined territory
Data are provided for all statistical units in a defined territory (for persons in particular, data are provided for all usual residents in a defined territory). Data produced yearly cover the whole national territory.
18.1.4.4. Adequacy of data sources - Availability of small-area data
Data are available for small geographical areas and for small subgroups of statistical units. More precisely, data are produced mainly at LAU2 level. For few variables they are released at Enumeration Area level (and at grid level).
For small subgroups of statistical units or for small domains, the feasibility of releasing data on variables estimated (not derived by registers) and on the related cross-tabulations is subject to the definition of ad hoc statistical models and to the evaluation of the accuracy of the estimates.
18.1.4.5. Adequacy of data sources - Defined periodicity
Since 2018, the PPHC produces and releases census-type data every year. Thus, Italy complies with the ten-years periodicity (and for selected variables the periodicity is more frequent).
18.2. Frequency of data collection
The frequency of data collection is annual (but every year census data are produced for all the municipalities).
The information provided in the quality report is referred manly to the 2021 wave.
18.3. Data collection
In the following sub-sections, all methodological aspects concerning "data collection", "the sampling design" for the surveys supporting the 2021 PHC, "data processing", "editing and imputation" procedures, and the "estimation methodologies" adopted will be explained.
18.3.1. Data collection - Questionnaire based data
Design and testing of questionnaire
The test of the questionnaires was performed first in 2015 (first experimental survey) and then in 2017 (Pilot survey). Attached are the questionnaires for the 2021 wave of the A and L surveys in the Italian (Q_A_IT , Q_L_IT) and English versions (Q_A_EN , Q_L_EN).
Preparation of field work - Data collection
Every year are involved in the surveys about 2850 municipalities for a total of 1,400,000 households (of which 950,000 for the List survey and 450,000 for the Area survey). In 2021 the number of municipalities involved was higher (4531 out of the total 7903 municipalities in Italy) as the number of non-representative municipalities was double than the number originally planned (in 2020 the fields surveys were canceled due to the pandemic therefore the municipalities due to participate in 2020 were 'moved' to 2021). Therefore the households involved in the 2021 surveys were respectively 2,472,400 for the L survey and 776,097 for the A survey. The reference population is the population usually resident in Italy.
The household sample for the List survey is extracted from the Population register linked to the Register of addresses (for each municipality participating in the survey a sample of households registered in the municipality population register at the beginning of the reference year is extracted). The address sample for the Area survey is extracted from the address register updated at the end of the year preceding the reference year.
The field-work is in charge to the municipalities, who yearly recruit enumerators. Training is performed centrally by Istat through an online platform (blended training model: self-training with centrally pre-defined training materials + virtual classrooms).
The maps (static and dynamic) for the Area survey are provided to municipalities by Istat GIS Division (maps are only used when the enumeration is performed for the whole EA (Enumeration Area), due to the low quality of the addresses in the address register; otherwise only selected addresses are sampled and have to be canvassed 'blindly' by enumerators, with a door-to-door technique). For the List survey, the data collection is performed as in a normal household survey (only the sampled household has to be surveyed: if the household doed not complete the questionnaire online, after some weeks enumerators will follow-up).
For both Area and List survey there is a legal obligation to provide census information,and sanctions are foreseen for those who don't comply with this obligation. Anyway, the communication campaign (see below) is aimed at encouraging people participation, based on census usefulness for the country.
Communicate the Permanent Population Census: a special campaign
In 2021, Istat implemented a special communication campaign to support the new edition of the Permanent Population Census, capable of overcoming the potential effects of the Covid 19 emergency on the participation of families, after the suspension of data collection operations, occurred the year before.
To promote a new favorable climate for census operations, it implemented a communication strategy with a strong institutional value of transparency and sharing, aimed primarily at discussion and dialogue with citizens and users. The main objectives of the communication campaign were to inform but also to engage respondents. To achieve these goals, the challenge was therefore to promote a communication and dissemination strategy of statistical data according to a cross-media approach, integrated and synergistic between new media, social channels, institutional website and other web services, to develop new communication systems capable of reaching a greater number of users, a viral treatment of content, to be entrusted with a multiplier effect and to stimulate participation in the census.
Calibrated on the architecture and phases of census operation and segmented according to the different targets and stakeholders, the communication campaign has envisaged the creation of a narration, able to involve not only those were called to respond to census, but also those who would benefit from their results.
Through the use of a direct, familiar and engaging language, the campaign was characterized for very innovative for the intense use of social networks; the simplification of information through infographics and video tutorials; the engagement by a system of contests; the dissemination of information via digital PR (Public Relations); the organization of virtual events. At the same time, to ensure the widest diffusion of messages and reach also the generalist target for which traditional media was the only source of information, it was realized an Advertising plan on, national and local, traditional media, (television, radio, press, outdoor...) and Istat has produced a special project on the public television (Rai 2) called “DATA COMEDY SHOW”. It was new comedy panel show, which in eight episodes told the story of the country through the official statistics. A challenge that Istat launched to promote, through an entertainment program, official statistical information to the general public, but also an innovative way to enhance the information assets of the permanent censuses.
Annexes:
Questionnaire used for the areal survey (A-survey) in English language
Questionnaire used for the list survey (L-survey) in English language
Questionnaire used for the areal survey (A-survey) in Italian language
Questionnaire used for the list survey (L-survey) in Italian language
18.3.2. Data collection - Register based data
Several registers are used for census production. The core of the census is the statistical Population Register. Its main administrative data sources are the population registers of Italian municipalities (merged in a centralized administrative population register: ANPR). This register provides the basis of the fully register-based population count but in order to obtain the official census count is ‘corrected’ by using the Signs of Life methodology (see section 18.6). Another statistical register used in order to implement the SoL method is AIDA i.e. the Integrated Data Base of Usual Residents, a thematic register that integrates all other sources (besides RBI) considered relevant in order to determine the usually resident population.
Another fundamental base register used both for population and territorial statistics is RSBL (Registro Statistico di Base dei Luoghi) i.e. the Register of places, including addresses, buildings and dwellings. The main sources of RSBL are the National Address Register (ANNCSU) and the Cadaster.
Other statistical registers used for the production of population census data are the thematic registers on Education (based on administrative sources provided by the Ministry of Education and the Ministry of University and Research, and on 2011 census data) and on Employment (based on several administrative sources provided mainly by the Social security and Tax authorities). Another fundamental statistical register used in the production of official statistics is the Statistical Business register (ASIA: Archivio statistico delle imprese attive).
The linkage of micro-data used for the PPHC (Permanent Population and Housing Census) is performed by using the SIM code, a readily available unique identifier assigned centrally by Istat through a pseudonymisation process. Indeed, all the administrative sources received by Istat, before being used for any production process, are processed in order to assign to each record present in the source a SIM code. After the pseudonymisation takes place, the SIM output is made available to internal users that will use it, within their statistical processes, to produce the outputs (i.e. by linking the different sources through the SIM code.) This unique identifier (SIM code) makes it possible to identify the same individual or economic unit across the different administrative datasets and over time.
The SIM code is the result of a three-steps statistical processing performed on all the administrative sources received by Istat: a first phase in which administrative datasets received by the owners are loaded into the SIM Oracle DB, a second phase of pre-treatment in which pseudonymisation takes place, a third phase in which the SIM output is made available to internal users (that in turn will use it, within their statistical processes, to produce the outputs). In this perspective, the pseudonymisation process is a record linkage process which recognizes the units, with respect to the units already present into the system, and associates them the same code.
Record linkage procedures (deterministic procedures are used) are needed because the identifying variables can contain errors. The stage of Integration is incremental: as the datasets are acquired and loaded into the DB tables, units (individuals or economic units) are progressively integrated in the SIM tables with the data already present. The integration process is different for each type of unit. For individuals, the integration strategy depends on the identifying variables available in the dataset, among the following variables: TAX CODE, LAST NAME, NAME, SEX, DATE OF BIRTH, PLACE OF BIRTH (PROVINCE CODE, MUNICIPALITY CODE, COUNTRY) and on their quality. In general, at a first step, linkage is made for equality / similarity of all available key variables, in the following steps the available variables are used alternately up to the final steps in which only the TAX_CODE it is used as a linkage variable. The pseudonymisation ends with the attribution of the identification code (the SIM code) within the Table SIM Individuals and each instance of the dataset becomes part of the Table.
18.3.3. Data collection - Sample survey based data
The sampling design
Two ad hoc sample surveys (Area survey and List survey) are conducted annually for the quality measurement of the fully register-based population count estimation and to collect data for not replaceable (or only partially replaceable) variables.
In the first cycle (2018-2021) of the Permanent PHC in Italy, two ad hoc surveys are conducted annually in self-representative municipalities (i.e. those with a population over 17,800 inhabitants and smaller ones which do not rotate in the sampling scheme of the Labour Force Survey) and every four years, according to a rotation scheme, in non-self-representative municipalities (i.e. all the others).
For the selection of the sample, a two-stage sampling design is adopted; at the first stage, municipalities are selected; at the second stage, in each selected municipality, a sample of households is selected from the Population Register for the List survey and a sample of addresses from the Address Register for the Area survey.
Every year are involved in the surveys about 2850 municipalities for a total of 1,400,000 households (of which 950,000 for the List survey and 450,000 for the Area survey). In 2021 the number of municipalities involved was higher (4531 out of the total 7903 municipalities in Italy) as the number of non-representative municipalities was double than the number originally planned (in 2020 the fields surveys were canceled due to the pandemic therefore the municipalities due to participate in 2020 were 'moved' to 2021). Therefore the households involved in the 2021 surveys were respectively 2,472,400 for the L survey and 776,097 for the A survey. The reference population is the population usually resident in Italy.
Methodologies used for any estimations, models or imputations
At the core of the Italian Permanent Population and Housing Census (PPHC) is the Population Register (PR). Together with the Statistical Base Register of Addresses (RSBL) and with the thematic registers on education and employment, PR provides the basis for the production of population census data in a combined census design (census data are produced by using multiple sources).
Concerning the census outputs, the PPHC produces a:
- fully register-based population count;
- census hypercubes estimated by the joint use of information already available in registers and of data collected on the field, through the use of statistical models.
For point 1), see the SoL method in section 18.8.
For point 2), data that are the result of integration between sources and that are Register Based only, the output value is either '0' or '1' where the value '1' is assigned to one and only one of the classification modes of the considered variable while all the others are assigned '0'.
For data that are the result of an estimation process through the application of a specific model (which integrates information from different sources as illustrated in the 'sources X topics' diagram in section 18.1.3), there are three different outputs depending on the approach employed:
A) the output data is a "0" or "1" value where the "1" value is assigned to one and only one of the classification modes of the considered variable (e.g. the "employed" mode of the CAS variable, the "educational attainment" values for individuals without administrative signals).
B) the output datum is a probability value (between '0' and '1') assigned to all the classification modes considered for that specific topic (such that the sum of the probabilities returns 1) - (E.g. the variable TSH related to hypercubes on households, the variables POC, OWS, NOC, UFS, DFS related to hypercubes on dwellings).
C) the output data is the result of using (according to a sequential logic) the two methods illustrated above, first A) and then B).
Example - for the variable 'current activity status - CAS' first we estimate whether the generic individual is "employed" (Yes/No) with certainty; and - in the case of "not employe" - we estimate the "probabilities" for the other modes of the CAS classification (applies to the remaining modalities of the CAS variable except for the modality age 0-14 which is from the register).
Once the micro-data of each statistical unit (individual; household; dwelling) has been "constructed" in a complete manner, the absolute frequency referring to any cross cell is obtained by summing the values at the micro-data level relative to the units that belong to that cross cell: sum of values in the interval [0;1] extremes included. However, the presence of decimal values in the probabilities means that the count frequencies thus obtained do not necessarily coincide with integer values.
In order to produce the integer values, a "rounding procedure" was developed and applied, which guarantees, for each hypercube, the highest possible degree of consistency between the rounded values referring to the cells of maximum intersection and the rounded values on the marginal cells.
From an operational point of view, the rounding process was applied separately to each hypercube; the operation ensured consistency between the rounded values referring to the generic crossing cell and the rounded values on the marginals of the specific hypercube. The rounding process adopted, which is based on controlled rounding techniques, guarantees integer values for each cell, preserving the additive.
In some cases, however, due to rounding, a variable appearing in one or more hypercubes may have different marginal distributions in the different hypercubes in which it appears. The differences, however, are generally negligible.
For more details on the methodology used to estimate the data on the different topics, see ANNEX in section 19 (Sheet 2. "Data Sources").
For more details on the methodology used to editing and imputation (E&I) procedures on the different group of topics, see ANNEX in section 19 (Sheet 5. "Imputation method").
Possible biases in the estimation due to methodologies applied
The bias of the estimates is related to the validity of the basic hypothesis linking the variables of interest with the auxiliary variables used, and thus to the goodness of fit of the models used.
For more details, see sections 13.2 and 13.3.
The standard error was not calculated.
18.3.4. Data collection - Data from combined methods
Description of the methods
See sub-section "Methodologies used for any estimations, models or imputations" in section 18.3.3.
For more details on the methodology used to estimate the data on the different topics see:
- ANNEX attached in section 18.5.
- ANNEX attached in section 19 (Sheet 2. "Data Sources")
18.4. Data validation
The 'data validation' operations only concern the data referring to the hypercubes required by Eurostat for European dissemination, down to the territorial level for which themes and their classifications are requested.
Database Preparation
For the validation of the output data, a process was implemented that integrates the produced estimates and reference data into a data structure containing
(a) the tables of census results to be validated and disseminated
(b) the data tables of the reference sources requested by the thematic experts for validation (i.e. past censuses, ongoing Istat surveys, population flows, administrative sources).
For the management of census data, the Istat IT Department has set up a specific production environment consisting of a microdata DB and an aggregated data DB specifically for Eurostat dissemination. For data validation processes, a primary data warehouse environment was also created for data analysis and report production.
Validation of simple distributions of topics of interest
Once the IT environment was defined and both the data to be validated and the benchmark information were loaded, the validation phase began according to three different modes of operation. In order to release the data, validation had to be successful for each of the three modes.
1. Territorial comparisons
For each topic, treated individually, the percentage distribution calculated on the finest territorial domains was compared with that referring to the superior territorial domain to which it belonged (LAU2 vs NUTS3, NUTS3 vs NUTS2, and so on), in order to verify the absence of significant deviations.
For each classification mode, a tolerance threshold was defined (in percentage points) linked to territorial and demographic factors; in the event of deviations beyond the threshold, it was checked whether the data was acceptable (in the case of percentage deviations relating to: low absolute data; territories that had undergone a demographic increase or decrease); if not, it was flagged as 'anomalous' and sent back to the estimation process for a subsequent release of output data.
2. Longitudinal comparisons
For each topic, treated individually, the percentage distribution referring to a specific domain (NUTS2, NUTS3) was compared with that of the same domain referring to a previous census occasion (2011, 2018, 2019, 2020 as appropriate). For example: for the variable 'Residence one year earlier', 2011 census data were used; for the variable 'Current activity status', 2019 census data were used; for the variable 'Education level', 2020 census data were used.
In some cases, summary indicators (e.g. dissimilarity indices) were also calculated to compare the distributions.
3. Comparison with benchmark data
For each topic, treated individually, the percentage distribution referring to a specific domain (NUTS2, NUTS3) was compared with that for the same domain referring to benchmark data available from current ISTAT surveys or administrative sources.
Again, summary indicators were calculated to check for possible anomalies in the estimates produced.
Validation of crossover data
Once the simple distributions had been validated, we proceeded - in a similar way - to the validation of the data obtained by crossing two or more topics. The choice of crossovers was oriented towards those crossovers capable of highlighting the presence of information inconsistencies (e.g.: individuals 0-10 years old with the educational level 'Bachelor's degree').
In fact, since the census results come from the application of an estimation methodology that integrates data from administrative sources with sample data, it was possible to find estimates referring to specific crossings that could be inconsistent with the way the crossings were carried out. Fortunately, there were rare cases of inconsistencies that led to a revision of the estimation methodology and a new release of the output data.
A final validation step involved the analysis of all hypercubes, which excluded the presence of outliers in individual cells.
Benchmark Data Sources
The following benchmark data sources were used to validate the data:
- PHC 2011; PHC 20018; PHC 2019; PHC 2020;
- Istat current surveys (Labour Force Survey; Registrations/Cancellations at the registry office survey);
- Building Cadaster;
- Area Survey and List Survey conducted for PHC 2021.
18.5. Data compilation
Data capturing and Coding
Concerning data capture for survey data, the data collection is totally paperless (no paper questionnaires used). The coding of territorial variables is performed directly during enumeration (no open text answers are collected; the respondents select the relevant answer through drop-down menus).
Concerning register data, all the administrative sources received by Istat, before being used for any production process, are processed in order to assign to each record present in the source a SIM code. After the pseudonymisation takes place, the SIM output is made available to internal users that will use it, within their statistical processes, to produce the outputs (i.e. by linking the different sources through the SIM code.)
The SIM code is the only varibale used for linking the different sources for persons data. Another code used is the address identifier (CUI) used both for the surveys sampling frame and for the production of census data at sub-municipal level. Details on the assignment of the SIM code have been provided in section 18.3.2.
Use of signs of life (SoL) to estimate the population at LAU2 level
The SoL method is applied to implement the usual residence definition according to Regulation N. 763/2008. All the sources listed under 18.1.3 (organised in a smaller number of statistical registers) are being used to estimate the actual presence of a person at their registered address i.e. to correct the Population Register (ANPR according to the Italian acronym, which is the basis of the census count) through the use of classification criteria applied to individual records in statistical registers. For details see section 18.6.
Data compilation: pre-processing, de-duplication, editing and imputation
The data compilation operations concern only the A-survey and L-survey sample data collected in the field; only the household type and nucleus variables were determined following the Editing and Imputation process of the data from the Population Register of Individuals, enriched with information from ANPR (National Register of the Resident Population).
Data pre-processing
As the surveys are paperless, the information provided by respondents or surveyors is automatically recorded in the 'Acquisition System'.
The Acquisition System contains all data compilations including partial compilations saved by respondents before completing and sending the questionnaire; therefore, the first operations performed in the "Production System" are to identify, for each questionnaire code, the one containing the most information.
Individuals belonging to households that have fully or partially completed (partial non-response) the questionnaire are subjected to the SIM-code attribution processes for the anonymisation of individual records.
De-duplication phase
In the de-duplication phase, duplicate questionnaires and duplicate individuals are identified, within the same survey type and regardless of location. These duplicates are retained in the first version of the data tables (a version that can always be consulted) but are not reflected in the individual data tables, which move on to the next stages (check and correction; estimation).
Cases of individuals detected with both A and L are not considered duplicates, so in the individual data table the same individual can be present twice but in different surveys: this situation is handled in the integration processes with the register data, when one has to be 'chosen' (in 2021 the record from the A-survey was chosen).
The anonymised (from pseudonymisation process by SIM code) and de-duplicated data in the Production System undergo the Editing and Imputation procedures jointly for the data of the two surveys (A and L).
Editing and Imputation (E&I) of individual variables defining the legal population
The first phase of the E&I process focuses on the individual variables that define the structure of the population (gender; date of birth; citizenship - Italian/foreigner;) and determine the paths to filling in the questionnaire.
The methodology applied provides for concordance checks between the variables surveyed, those in Population Register and those in the stock data coming from ANPR (National Register of the Resident Population); data for which discrepancies are found for at least one of the variables considered, are corrected by deterministic imputation algorithms using, where possible, other variables from the questionnaire and auxiliary variables specifically implemented also in order to keep the information of individuals within the same household congruent.
Once these variables have been fixed more or less definitively, the E&I processes continue in parallel for different subject areas of the questionnaire.
Subject area A: E&I of demographic and family variables
In the first step we proceed to the E&I of the variables with congruence constraints between individuals belonging to each household: sex, age, relationship, marital status, year of marriage, marital status before last marriage, foreign citizenship status, etc. in order to restore congruence between them. The E&I methodology provides for the following steps to be carried out cyclically until an empty reservoir of errata is obtained:
a) execution of the Family Procedure (see below "The Household Procedure");
b) editing runs to detect families with inconsistent data; deterministic or manual corrections are performed on these data;
c) execution of the Families Procedure on erroneous families only;
d) possible reiteration from point a).
Since the Households Procedure checks and corrects the information and at the same time assigns the variables defining the family type and the households in a deterministic manner, at the end of the cyclical process a validation of the aggregated data is carried out and, if necessary, targeted corrections are made leading to new correction cycles.
In the second phase, the other individual variables on citizenship and residence are corrected by identifying, via compatibility rules, the incorrect individuals and correcting them using deterministic, probabilistic or manual methods.
The household type and nucleus variables (26 in all) were not estimated on the basis of the sample data of the permanent population census, but are the result of a complex and innovative E&I process carried out using data from the Population Register enriched with information from the ANPR stock data.
Subject area B: E&I of socio-economic and commuting variables.
These E&I processes first involve checking and correcting the core variables using a methodology that can be schematised according to the following steps:
a) internal compatibility checking of responses by blocks of variables in the questionnaire to identify erroneous data;
b) verification of the answers by comparison with benchmark information (micro and macro) present on Istat's thematic registers (Integrated Bases of Educational Qualifications; Labour Register) or with internal information sources (origin-destination matrices for commuting);
c) correction of erroneous data with deterministic, probabilistic or donor methods.
The correction of variable values is then followed by a validation of the aggregated data which, if necessary, determines targeted corrections on which new correction cycles can be performed.
Next, the E&I of the no-core variables is carried out by means of compatibility rules with the variables corrected in the first step and by using various methods for correcting missing, anomalous or incorrect data; deterministic or probabilistic imputation methods are then applied to the latter data.
Subject area C: E&I of the housing variables
For the correction of the variables in the Housing section of the questionnaire, we first identified the cluster of households co-habiting households as households co-habiting in the same accommodation have to report the same information regarding the characteristics of the accommodation. The identification of clusters of co-habiting households has concerned the questionnaires collected by the Area survey. In the case of several households co-habiting in the same dwelling, each household was assigned a questionnaire. For each housing unit with several co-habiting households, a 'father' household was identified. For the households co-habiting with the 'father' unit, the surveyor indicated in the ID_UNIT_PADRE field, the questionnaire code (CODQUEST) of the household "father" in order to obtain the unique identification of the households that cohabited in the same dwelling. Summarising the E&I strategy for co-habiting households involves the following steps:
- Identification of co-habiting clusters:
- Correction of co-habiting variables (COAB, NFAM, NOCC);
- Correction of co-housing variables.
Having completed the activity of identifying co-habiting households and having made the information on the respective dwellings consistent, the proceeded with the correction of the housing variables. The first step was to check for violated rules and then impute the values by probabilistic method respecting the distribution of the values in the subset of exact records and for each variable the strata, were strata within which the imputation was to be carried out were identified.
Deletion of Records
In general, the E&I procedures provided for the deletion of records only in cases of clearly erroneous data due to 'apparent' (invented or randomly produced) compilations. The records entirely deleted for the sample surveys supporting the Italina PHC 2021 were very few in number.
The Family Procedure (FP)
The Family Procedure is a software implemented in Istat, by computer scientists and statisticians, for the control and correction of data sample data of social surveys and adapted to the needs of the permanent population census for the editing and imputation of family registry variables (relationships, marital status, year of marriage). It uses complex control algorithms that are based on: the determination of potential couples within the household of the family (with a system of scores given to all combinations of pairs of individuals in the family) and on combinations of variable strings then proceeding to deterministic correction. Having corrected the family registry variables determines all the variables defining the family type and nuclei.
Methodologies adopted for the estimation of census topics
See sub-section "Methodologies used for any estimations, models or imputations" in section 18.3.3.
For more datails on estimation methods adopted in 2021 PHC in Italy see the ANNEX attached in this section.
Annexes:
Information on estimation methods adopted in 2021 PHC in Italy
18.6. Adjustment
The SoL method is applied to implement the usual residence definition according to Regulation N. 763/2008. All the sources listed under 18.1.3 (organised in a smaller number of statistical registers) are being used to estimate the actual presence of a person at their registered address i.e. to correct the Population Register (ANPR according to the Italian acronym, which is the basis of the census count) through the use of classification criteria applied to individual records in statistical registers.
The definition of administrative 'sign of life' has been included in the General Census Plan (PGC). The classification of signs of life according to their relevance is the following:
- direct (administrative) Signs of Life. Work and study signals, as well as home leases are considered particularly relevant as proxy of usual residence in Italy therefore they are classified as direct signs of life. Perceiving a pension (if the perceiver has not moved abroad) or a welfare benefit from the National Social Security System are also considered direct SoL;
- indirect (administrative) Signs of Life. Fiscal records (tax declarations, tax return filings, etc.) are considered as indirect signs of usual residence in Italy, both through the 'dependent family members box' of the tax return filings (that allows identifying the relationships between the filing holder and the 'spouse' or 'child/children') and through the information on other dependent relatives living with the filing holder or receiving an alimony (such as a spouse legally and actually separated, the children descendants, the parents and so on). Owning a car according to the Cars Public Register or owning a property according to the Real Estate Register are also considered indirect signs of usual residence in Italy;
- other types of (administrative) signs of life are those that can be derived through the household composition according to the population register (PR). As it will be explained in the following, these SoL are derived for the candidates to over-coverage, i.e. in some cases, even if an individual has no direct nor indirect SoL, he/she might be confirmed as usual resident in Italy based on the situation of his/her fellow household members (i.e. has children that go to school).
Individuals are first classified according to the presence/absence of direct SoL; as a second step, individuals without direct SoL, are classified according to the presence/absence of indirect SoL; finally, individuals with no direct nor indirect SoL are classified according to their household relationships. In order to try and implement the 12-months criterion, the SoL in the integrated database combining 40 administrative sources (AIDA) are observed over a period of 24 months (the 24 months preceding the census reference date), in search of SoL of at least 12 months.
The time lag between targeted reference date of the population count and availability of estimation results is of one year (the population count is produced at the end of each year with reference to the end of the previous year).
The integration process involves the processing of data from more than forty administrative archives, each containing basic information on individuals’ SoL (events) and covering several years. On the whole, the records processed for each reference year are several hundreds of millions given that each individual can give rise to more than one occurrence in the same source at different times, and that the same individual can appear at the same time in more than one source.
The longitudinal observation of direct SoL over two years makes possible to capture specific profiles of presence (patterns of continuity) of individuals on the territory. In fact, in order to use SoL as a proxy of usual residence, it’s necessary to identify subpopulations whose members are supposed to have a similar behaviour with respect to usual residence. These patterns are crucial for deciding deterministic rules to be used as classification criteria. These deterministic rules were first established based on expert knowledge, as of the first register-based count in 2020. In 2021 they were improved, thanks to the availability of both survey data and SoL, combining evidence resulting from a statistical model with expert knowledge.
The first step is the identification, for each municipality, of two aggregates: 1) individuals with direct SoL (some of them will be confirmed in the PR, some others will instead be the PR under-coverage) and 2) individuals without direct SoL. These latter are then screened in search of indirect SoL and, as a result, the potential PR over-coverage is finally identified i.e. individuals with neither direct nor indirect SoL. However, a further check is performed on this sub-group (individuals with neither direct nor indirect SoL) in order to identify "spouses" or minor children of household reference persons with direct SoL. These further outputs will reinforce the absence of SoL resulting from the previous steps (thus determining their definitive classification as over-coverage) or instead allow their ‘recovery’ within the usual residents aggregate. In the latter case, individuals who would otherwise end up as being classified as PR over-coverage as lacking direct or indirect SoL from other sources, are considered as usual residents e.g. in the case of members of a household with children aged less than 14 attending school in the same municipality.
As a general rule, direct SoL are used for both confirming individuals registered in the PR and identifying the PR under-coverage, while indirect SoL are used only for confirming individuals registered in the PR (i.e. they are not used to identify the PR under-coverage).
Finally, all registered residents of very small municipalities (below 2.000 inhabitants) are confirmed as usual residents, since quality indicators show that in these cases municipal registers are quite accurate, as confirmed also by several exploratory analyses conducted by Istat researchers over the last 5 years. The same is done for people aged 98 years and above, whose usual residence is confirmed due to the high quality of PR data for this subpopulation, and for individuals resident in institutions, registered homeless and individuals registered at formal/informal settlements (identified through an ad hoc survey carried out every year, during which both the addresses and the corresponding population aggregates extracted from the PR are submitted to Municipal Census Offices for back office validation).
As a result of this process, the statistical PR is corrected for both over-coverage and under-coverage i.e.:
- individuals with SoL but not resident according to the PR, are added to the population count (under-covered individuals);
- individuals resident according to the PR but without SoL in other administrative sources (over-covered individuals).
Finally, it is worth mentioning that the misplacement error of the population register - individuals registered in a municipality who are usually resident in a different one - has not been evaluated so far i.e. individuals resident according to the PR who are included in the population count (i.e. who are confirmed as usual resident on the basis of SoL) are counted in the municipality where they are registered, independently from the localization of their respective SoL.
No further comments.
Additional information is provided in ANNEX 'Further qualitative metadata on the 2021 Population and Housing Census'.
The data set transmitted to Eurostat include all the hypercubes requested according to EU Regulation (EC) 763/2008 and the following implementing regulations: Regulation (EU) 2017/543; Regulation (EU) 2017/712 and Regulation (EU) 2017/881.
All the definitions and classifications are compliant with the abovementioned Regulations. The few exceptions are detailed under the related topics.
17 December 2024
The information is given separately for each census topic.
The EU programme for the 2021 population and housing censuses includes data on persons, private households, family nuclei, conventional dwellings and living quarters.
The persons enumerated in the 2021 census are those who were usually resident on the Italian territory at the census reference date.
Data are available at different levels of geographical detail in EU countries: national, NUTS2/NUTS3 regions and local administrative units (LAU), grids.
See the following sub-concepts.
There are no particular reasons for census data unreliability.
With reference to the usual residence, the signs of life methodology has been used to estimate the population register over and under-coverage, while a measurement of the error (mean squared error) of the register-based count has been conducetd through the field surveys.
For more details on the methodology used to estimate the data on the different topics, see the following annexes:
- ANNEX attached in section 18.5;
- ANNEX attached in section 19 (Sheet 2. "Data Sources").
Counts of statistical units should be expressed in numbers and where is needed rate per inhabitants enumerated in the country.
Data capturing and Coding
Concerning data capture for survey data, the data collection is totally paperless (no paper questionnaires used). The coding of territorial variables is performed directly during enumeration (no open text answers are collected; the respondents select the relevant answer through drop-down menus).
Concerning register data, all the administrative sources received by Istat, before being used for any production process, are processed in order to assign to each record present in the source a SIM code. After the pseudonymisation takes place, the SIM output is made available to internal users that will use it, within their statistical processes, to produce the outputs (i.e. by linking the different sources through the SIM code.)
The SIM code is the only varibale used for linking the different sources for persons data. Another code used is the address identifier (CUI) used both for the surveys sampling frame and for the production of census data at sub-municipal level. Details on the assignment of the SIM code have been provided in section 18.3.2.
Use of signs of life (SoL) to estimate the population at LAU2 level
The SoL method is applied to implement the usual residence definition according to Regulation N. 763/2008. All the sources listed under 18.1.3 (organised in a smaller number of statistical registers) are being used to estimate the actual presence of a person at their registered address i.e. to correct the Population Register (ANPR according to the Italian acronym, which is the basis of the census count) through the use of classification criteria applied to individual records in statistical registers. For details see section 18.6.
Data compilation: pre-processing, de-duplication, editing and imputation
The data compilation operations concern only the A-survey and L-survey sample data collected in the field; only the household type and nucleus variables were determined following the Editing and Imputation process of the data from the Population Register of Individuals, enriched with information from ANPR (National Register of the Resident Population).
Data pre-processing
As the surveys are paperless, the information provided by respondents or surveyors is automatically recorded in the 'Acquisition System'.
The Acquisition System contains all data compilations including partial compilations saved by respondents before completing and sending the questionnaire; therefore, the first operations performed in the "Production System" are to identify, for each questionnaire code, the one containing the most information.
Individuals belonging to households that have fully or partially completed (partial non-response) the questionnaire are subjected to the SIM-code attribution processes for the anonymisation of individual records.
De-duplication phase
In the de-duplication phase, duplicate questionnaires and duplicate individuals are identified, within the same survey type and regardless of location. These duplicates are retained in the first version of the data tables (a version that can always be consulted) but are not reflected in the individual data tables, which move on to the next stages (check and correction; estimation).
Cases of individuals detected with both A and L are not considered duplicates, so in the individual data table the same individual can be present twice but in different surveys: this situation is handled in the integration processes with the register data, when one has to be 'chosen' (in 2021 the record from the A-survey was chosen).
The anonymised (from pseudonymisation process by SIM code) and de-duplicated data in the Production System undergo the Editing and Imputation procedures jointly for the data of the two surveys (A and L).
Editing and Imputation (E&I) of individual variables defining the legal population
The first phase of the E&I process focuses on the individual variables that define the structure of the population (gender; date of birth; citizenship - Italian/foreigner;) and determine the paths to filling in the questionnaire.
The methodology applied provides for concordance checks between the variables surveyed, those in Population Register and those in the stock data coming from ANPR (National Register of the Resident Population); data for which discrepancies are found for at least one of the variables considered, are corrected by deterministic imputation algorithms using, where possible, other variables from the questionnaire and auxiliary variables specifically implemented also in order to keep the information of individuals within the same household congruent.
Once these variables have been fixed more or less definitively, the E&I processes continue in parallel for different subject areas of the questionnaire.
Subject area A: E&I of demographic and family variables
In the first step we proceed to the E&I of the variables with congruence constraints between individuals belonging to each household: sex, age, relationship, marital status, year of marriage, marital status before last marriage, foreign citizenship status, etc. in order to restore congruence between them. The E&I methodology provides for the following steps to be carried out cyclically until an empty reservoir of errata is obtained:
a) execution of the Family Procedure (see below "The Household Procedure");
b) editing runs to detect families with inconsistent data; deterministic or manual corrections are performed on these data;
c) execution of the Families Procedure on erroneous families only;
d) possible reiteration from point a).
Since the Households Procedure checks and corrects the information and at the same time assigns the variables defining the family type and the households in a deterministic manner, at the end of the cyclical process a validation of the aggregated data is carried out and, if necessary, targeted corrections are made leading to new correction cycles.
In the second phase, the other individual variables on citizenship and residence are corrected by identifying, via compatibility rules, the incorrect individuals and correcting them using deterministic, probabilistic or manual methods.
The household type and nucleus variables (26 in all) were not estimated on the basis of the sample data of the permanent population census, but are the result of a complex and innovative E&I process carried out using data from the Population Register enriched with information from the ANPR stock data.
Subject area B: E&I of socio-economic and commuting variables.
These E&I processes first involve checking and correcting the core variables using a methodology that can be schematised according to the following steps:
a) internal compatibility checking of responses by blocks of variables in the questionnaire to identify erroneous data;
b) verification of the answers by comparison with benchmark information (micro and macro) present on Istat's thematic registers (Integrated Bases of Educational Qualifications; Labour Register) or with internal information sources (origin-destination matrices for commuting);
c) correction of erroneous data with deterministic, probabilistic or donor methods.
The correction of variable values is then followed by a validation of the aggregated data which, if necessary, determines targeted corrections on which new correction cycles can be performed.
Next, the E&I of the no-core variables is carried out by means of compatibility rules with the variables corrected in the first step and by using various methods for correcting missing, anomalous or incorrect data; deterministic or probabilistic imputation methods are then applied to the latter data.
Subject area C: E&I of the housing variables
For the correction of the variables in the Housing section of the questionnaire, we first identified the cluster of households co-habiting households as households co-habiting in the same accommodation have to report the same information regarding the characteristics of the accommodation. The identification of clusters of co-habiting households has concerned the questionnaires collected by the Area survey. In the case of several households co-habiting in the same dwelling, each household was assigned a questionnaire. For each housing unit with several co-habiting households, a 'father' household was identified. For the households co-habiting with the 'father' unit, the surveyor indicated in the ID_UNIT_PADRE field, the questionnaire code (CODQUEST) of the household "father" in order to obtain the unique identification of the households that cohabited in the same dwelling. Summarising the E&I strategy for co-habiting households involves the following steps:
- Identification of co-habiting clusters:
- Correction of co-habiting variables (COAB, NFAM, NOCC);
- Correction of co-housing variables.
Having completed the activity of identifying co-habiting households and having made the information on the respective dwellings consistent, the proceeded with the correction of the housing variables. The first step was to check for violated rules and then impute the values by probabilistic method respecting the distribution of the values in the subset of exact records and for each variable the strata, were strata within which the imputation was to be carried out were identified.
Deletion of Records
In general, the E&I procedures provided for the deletion of records only in cases of clearly erroneous data due to 'apparent' (invented or randomly produced) compilations. The records entirely deleted for the sample surveys supporting the Italina PHC 2021 were very few in number.
The Family Procedure (FP)
The Family Procedure is a software implemented in Istat, by computer scientists and statisticians, for the control and correction of data sample data of social surveys and adapted to the needs of the permanent population census for the editing and imputation of family registry variables (relationships, marital status, year of marriage). It uses complex control algorithms that are based on: the determination of potential couples within the household of the family (with a system of scores given to all combinations of pairs of individuals in the family) and on combinations of variable strings then proceeding to deterministic correction. Having corrected the family registry variables determines all the variables defining the family type and nuclei.
Methodologies adopted for the estimation of census topics
See sub-section "Methodologies used for any estimations, models or imputations" in section 18.3.3.
For more datails on estimation methods adopted in 2021 PHC in Italy see the ANNEX attached in this section.
Annexes:
Information on estimation methods adopted in 2021 PHC in Italy
At the core of the Italian Permanent Population and Housing Census (PPHC) is the Population Register (PR). Together with the Statistical Base Register of Addresses (RSBL) and with the thematic registers on education and employment, PR provides the basis for the production of population census data in a combined census design (census data are produced by using multiple sources).
Sampling data
Two ad hoc sample surveys (Area survey and List survey) are conducted annually for the quality measurement of the fully register-based population count estimation and to collect data for not replaceable (or only partially replaceable) variables.
Therefore, concerning the census outputs, the PPHC produces a:
- fully register-based population count;
- census hypercubes estimated by the joint use of information already available in registers and of data collected on the field, through the use of statistical models.
In the first cycle (2018-2021) of the PPHC, two ad hoc surveys are conducted annually in self-representative municipalities (i.e. those with a population over 17,800 inhabitants and smaller ones which do not rotate in the sampling scheme of the Labour Force Survey) and every four years, according to a rotation scheme, in non-self-representative municipalities (i.e. all the others). In each municipality, a sample of households is selected from the Population Register for the List survey and a sample of addresses from the Address Register for the Area survey.
Every year are involved in the surveys about 2850 municipalities for a total of 1,400,000 households (of which 950,000 for the List survey and 450,000 for the Area survey). In 2021 the number of municipalities involved was higher (4531 out of the total 7903 municipalities in Italy) as the number of non-representative municipalities was double than the number originally planned (in 2020 the fields surveys were canceled due to the pandemic therefore the municipalities due to participate in 2020 were 'moved' to 2021). Therefore the households involved in the 2021 surveys were respectively 2,472,400 for the L survey and 776,097 for the A survey. The reference population is the population usually resident in Italy.
Administrative data
With reference to the administrative data sources used to produce the census data - either through an integration process or an estimation model - it is suggested to see:
- ANNEX attached in section 18.1.3 (List of data sources per topic);
- ANNEX attached in section 19 (Sheet 3. "Administrative data sources").
The metadata provided here are referred mainly to the 2021 wave of the Permanent Population and Housing Census (end of the first cycle of the PPHC) but the frequency of dissemination is annual (the Permanent Census produces and releases yearly data).
The time lag between census reference date and the first release of data for the basic socio-demographic characteristics is of one year and currently cannot be reduced, taking into account the time necessary for processing both survey and administrative data.
Data on the population by sex, age, citizenship, and education level by municipality were released about one year after the census reference date (15 December 2022). Data on current activity status and on households (no. of households by no. of components) were released in February 2023. The same data at enumeration area level were released in June 2023.
The 15 December 2022 release included data on the hard-to-reach populations (institutional population, homeless and people living in formal/informal camps). Data on conventional dwellings (no. by occupancy status) was released in March 2023. In December 2023 were released data on the population by migratory background. The remaining data will be released in the course of 2024, after trasmission of census hypercubes to Eurostat (31 March 2024).
No preliminary data were released.
The data are comparable across all the different geographical areas.
With regard to grid data, the definitions, concepts and classifications adopted are all in line with the provisions of Implementing Regulation 2018/1799.
The data are therefore fully comparable at European level.
The A and L sample surveys were conducted with the same methodology throughout the territory. The data were produced with the same methodology and using nationally available sources.
The data are therefore also fully comparable between Italian regions.
Not applicable.


