Income and living conditions (ilc)

National Reference Metadata in Single Integrated Metadata Structure (SIMS)

Compiling agency: Statictics Poland


Eurostat metadata
Reference metadata
1. Contact
2. Metadata update
3. Statistical presentation
4. Unit of measure
5. Reference Period
6. Institutional Mandate
7. Confidentiality
8. Release policy
9. Frequency of dissemination
10. Accessibility and clarity
11. Quality management
12. Relevance
13. Accuracy
14. Timeliness and punctuality
15. Coherence and comparability
16. Cost and Burden
17. Data revision
18. Statistical processing
19. Comment
Related Metadata
Annexes (including footnotes)



For any question on data and metadata, please contact: Eurostat user support

Download


1. Contact Top
1.1. Contact organisation

Statictics Poland

1.2. Contact organisation unit

Social Surveys and Labour Market Department

1.5. Contact mail address

kancelariaogolnaGUS@stat.gov.pl

 


2. Metadata update Top
2.1. Metadata last certified

15 April 2025

2.2. Metadata last posted

15 April 2025

2.3. Metadata last update

15 April 2025


3. Statistical presentation Top
3.1. Data description

The European Union Statistics on Income and Living Conditions (EU-SILC) is an instrument aiming at collecting timely and comparable cross-sectional and longitudinal multidimensional microdata on income, poverty, social exclusion and living conditions.

In addition, are collected module variables every three years, six years or ad-hoc new policy needs modules.

The EU-SILC instrument provides two types of data:

  1. Cross-sectional data pertaining to a given time or a certain time period with variables on income, poverty, social exclusion and other living conditions;
  2. Longitudinal data pertaining to individual-level changes over time, observed periodically over four-year rotation scheme (Annex III (2) of 2019/1700).

Social exclusion and housing condition information is collected mainly at household level while labour, education and health information is obtained for persons aged 16 and over. The core of the instrument, income at very detailed component level, is mainly collected at personal level.

This instrument is anchored in the European Statistical System (ESS).

 

3.2. Classification system
  • International Standard Classification of Education (ISCED'2011);
  • International Standard Classification of Occupations (ISCO-08);
  • Classification of Economic Activities (NACE Rev.2-2008);
  • Common classification of territorial units for statistics (NUTS 2);
  • SCL - Geographical code list;
  • The recommendations made by the United Nations in the Canberra Group Handbook on Household Income Statistics should also be taken into account.

For more details on the classification used please, see EU Vocabularies, Eurostat's metadata server or CIRCABC .

3.3. Coverage - sector

Data refer to all private households and individuals living in the private households in the national territory at the time of data collection.

The EU-SILC survey is a key instrument for providing information required by the European Semester and the European Pillar of Social Rights, in particular for income distribution, poverty and social exclusion, as well as various related living conditions and poverty EU policies, such as on child poverty, access to health care and other services, housing, over indebtedness and quality of life. It is also the main source of data for microsimulation purposes and flash estimates of income distribution and poverty rates.

3.4. Statistical concepts and definitions

Statistical concepts and definitions for EU-SILC are specified in Regulation (EU) 2019/1700, Commission Implementing Regulation (EU) 2019/2181, and Commission Implementing Regulation (EU) 2019/2242. Additional information is available in the EU statistics on income and living conditions (EU-SILC) methodology and in the methodological guidelines and description of EU-SILC target variables (see CIRCABC).

Further details are provided in items 5, 15.1, 15.2.2 and 18.3.

3.5. Statistical unit

Statistical units are private households and all persons living in these households who have usual residence in the Member State. Annex II of the Commission implementing regulation (EU) 2019/2242 defines specific statistical units per variable and specifies the, content of the quality reports on the organization of a sample survey in the income and living conditions domain pursuant to Regulation (EU) 2019/1700 of the European Parliament and of the Council.

3.6. Statistical population

The target population is private households and all persons composing these households having their usual residence in the Member State.

3.6.1. Reference population

Definitions of reference population, household and household membership

The survey unit was a household and all the household members at least 16 years old at the end of the income reference period.

The survey did not cover collective households or  institutions 

Reference population

Private household definition

Household membership

The survey unit was a household and all the household members at least 16 years old at the end of the income reference period.

The survey did not cover collective households or  institutions 

 

Household means a person living alone or a group of people having their usual residence in private household. ‘Multi-person household’ means a group of two or more persons usually reside together and share income or household expenses with the other household members.
‘One-person household’ means a person usually resides alone in a separate housing unit but does not join with any of the other occupants to form part of a multi-person household.

 

 

 The household composition accounts for:

  • persons living together and sharing their income and expenditure who have been in the household for at least twelve months (either the real or the intended time of staying in the household should be considered),
  • persons at the age of up to 18 years (inclusive), absent from the household for education purposes, living in boarding houses or private dwellings,
  • persons at age more than 18 years absent from the household for education purposes if their stay outside the household is less than twelve months or if they are not financially independent and take part in household's income and/or expenditure,  
  • persons absent from the household because of their occupation, if their earnings are collected to the household's expenditure, and are considered as members of surveyed household (not another one),
  • persons absent from the household at the time of the survey, staying at education centres, welfare houses or hospitals, if their real or inteded stay outside the household is less than twelve months.
3.6.2. Population not covered by the data collection

The sub-populations that are not covered by the data collection: persons living in collective accommodation establishments.

3.7. Reference area

The whole area of Poland.

3.8. Coverage - Time

Reference year 2024. The SILC data are available for the period 2005-2024.

The reference period used for income and non-income variables:

In EU-SILC different reference periods are used.

The income reference period is the last calendar year preceding the survey, while for other variables presented in the tables the reference period is the current situation as well as the twelve-month or one week period before interview.

3.9. Base period

Not applicable.


4. Unit of measure Top

The data involves several units of measure depending upon the variables. Income variables are transmitted to Eurostat in national currency. For more information, see methodological guidelines and description of EU-SILC target variables available on CIRCABC


5. Reference Period Top

Description of reference period used for incomes

Period for taxes on income and social insurance contributions

Income reference periods used

Reference period for taxes on wealth

Lag between the income ref period and current variables

 The reference period for income tax prepayment and compulsory social insurance contributions is the year 2023. The account clearence with the Treasury Office (including payments and returns) effected in 2022 refers to the income for 2022.

 The income reference period was the previous calendar year (2023).

 

Taxes on wealth paid during the income reference period (2023) were recorderd.

 

 

The field work was from April to June, therefore the lag between income variables and other variables is from 4-6 months.

 


6. Institutional Mandate Top
6.1. Institutional Mandate - legal acts and other agreements

Regulation (EU) 2019/1700 was publish in OJ on 10 October 2019, establishing a common framework for European statistics relating to persons and households, based on data at individual level collected from samples (IESS). The Annex to the Commission implementing regulation (EU) 2019/2180 of 16 December 2019 specifies the detailed arrangements and content for the quality reports pursuant to Regulation (EU) 2019/1700 of the European Parliament and of the Council and Regulation (EU) 2019/2242.

6.2. Institutional Mandate - data sharing

Confidential microdata are not disclosed by Eurostat. Access to confidential microdata for scientific purposes may be granted on the  basis of Commission Regulation 557/2013 and Regulation 223/2009 of  the European Parliament and the Council on European statistics.


7. Confidentiality Top
7.1. Confidentiality - policy

The basic document is the Act on official statistics with its amendments. In addition, the CSO prepared a document: PERSONAL DATA PROTECTION POLICY (PODO) - a document describing the internal Personal Data Protection Policy regulating the principles of data processing in units of official statistics services.

7.2. Confidentiality - data treatment

Confidentiality – data treatment:

Rules for handling statistical data - appendix to the internal regulation of the President of the Statictics Poland.


8. Release policy Top
8.1. Release calendar

Please refer to the publication calendar - Polish Public Statistics publicly available on the website of the Central Statistical Office.

Tytułowy plan wydawniczy Głównego Urzędu Statystycznego i Urzędów Statystycznych na rok 2024

8.2. Release calendar access

Please refer to the Release calendar - Eurostat (europa.eu) publicly available on the Eurostat’s website.

8.3. Release policy - user access

In line with the Community legal framework and the European Statistics Code of Practice, Eurostat disseminates European statistics on Eurostat's website (see section 10 - 'Accessibility and clarity'), respecting professional independence and in an objective, professional and transparent manner in which all users are treated equitably. The detailed arrangements are governed by the Eurostat protocol on impartial access to Eurostat data for users. Additional information about microdata access is available in EU statistics on income and living conditions - Microdata - Eurostat.


9. Frequency of dissemination Top

Annual


10. Accessibility and clarity Top
10.1. Dissemination format - News release

Did not occur

10.2. Dissemination format - Publications

 Annual bilingual publication are available on the Poland statistical website.

10.3. Dissemination format - online database

Data from the EU-SILC study are published in publicly available databases:

  • KNOWLEDGE DATABASES - DOMAIN LIVING CONDITIONS

Warunki życia ludności | Dashboard | DBW.

The Knowledge Database focuses on presenting detailed thematic information (according to classifications, nomenclatures and code lists) for 31 domain areas: Construction, Prices, Demography, Education, Public finances, Maritime and Inland Economy, Energy, Social economy, Municipal and housing infrastructure, Business and consumer tendency, Culture, Forestry, Science and technology, Non-financial enterprises, Industry, National and regional accounts, Family, Agriculture, Labour Market, Internal Market, Information Society, State and protection of environment, Social benefits and assistance, Telecommunication and post, Transport, Tourism and Sport, Living conditions of the population, International exchange, Justice, Wages and salaries, labour costs, Health and healthcare.

Data are generally available for Poland in total, and for some indicators also by voivodships. In the case of demographic data and of local government unit budgets data are available also for lower levels of territorial division.

The length of the time series is due to the availability and consistency of information within each category.

In the database information is presented in tables, whereas in the "Dashboards" module information is available in the form of interactive charts and masp.

The Knowledge Database is a publicly available and free of charge. The access to the Knowledge Database and the use of its data is based on the open license (Attribution 4.0 International - CC BY 4.0).

 

 The STRATEG system is a publicly accessible system, which is updated quarterly and designed to facilitate the process of monitoring the development and evaluating the effects of actions undertaken to strengthen social cohesion. The database contains a comprehensive set of key measures to monitor (mainly annual) development at the national level, as well as at lower levels of territorial division. To ensure international comparability, the database also contains the main indicators for the EU, its member states and regions at NUTS 2 level.

The system is also used as a repository of indicators relating to various strategies – starting from the Europe 2020 Strategy of the EU and the most general Long-term National Development Strategy, as well as the Medium-term National Development Strategy, through 9 integrated strategies concerning economic efficiency and innovation, transport, energy security and environment, regional development, human capital, social capital, sustainable development of rural areas, agriculture and fishing industry, efficient state and national security. In addition, the system stores information on indicators for regional strategies, Partnership Agreement, National and Regional Operational Programmes.

Analysis and perception of information is facilitated by data visualisation tools in the form maps and charts, as well as a comprehensive set of metadata describing the indicators. The system resources also provide additional information, such as links to most important documents of strategic importance, reports and other publications.

Data sources

Data about indicators available in the system come from official statistics and several dozen other sources, such as scientific institutes, national and regional centres and agencies, databases of international organizations and institutions.

Additional information

Relative figures (indexes, percentages) are generally calculated based on absolute data, expressed with a higher precision than presented in the tables. Owing to rounding, totals may not always correspond to the sum of all figures shown.

10.3.1. Data tables - consultations

KNOWLEDGE DATABASES - DOMAIN LIVING CONDITIONS:

Due to the change of the DBW system and the mechanism for counting visits to the website, the table has changed. The data given below concerns the period from May 15, 2023 to December 31, 2023.

Period

Number of visits

2024

                  1693 

  STRATEG - INDICATORS OF DOMAIN LIVING CONDITIONS:

Period

Number of visits

2024

41422  

10.4. Dissemination format - microdata access

Information is disseminated through:


Education and Communication Department
00-925 Warsaw, Aleja Niepodległości 208
tel.: (48 22) 608 31 12,

Link to the data request form: Statistical data request form

Link to the Act on Public Statistics: Obwieszczenie Marszałka Sejmu Rzeczypospolitej Polskiej z dnia 13 lutego 2020 r. w sprawie ogłoszenia jednolitego tekstu ustawy o statystyce publicznej

10.5. Dissemination format - other

They are not available.

10.5.1. Metadata - consultations

They are not available.

10.6. Documentation on methodology

The survey documentation (for each year) consists of:

  • survey schedule;
  • DocSILC065 (Methodological guidelines and description of eu-silc target variables);
  •  forms;
  • Instructions for the interviewer / survey coordinator;
  • self- administered instructions for the respondent;
  • voivodeship reports on the implementation of the field survey;
  • scope and logical assumptions;
  • assumptions for the CAPI and SIB applications (Survey Information System);
  • SIB statistics on the completeness of the data set;
  • SIB statistics on errors and their explanations;
  • indicator settings;
  • voivodeship reports on the implementation of the study (during the survey) and the national report;
  •  result tables;
  • publication (internet).

Currently, a national metadata system made available to external users is in preparation.

The methodological description available on the website of the Central Statistical Office is available in the annual publication.

 

10.6.1. Metadata completeness - rate

All required concepts are provided.

10.7. Quality management - documentation

Standards according to ISO 9000 are used in Polish official statistics.


11. Quality management Top
11.1. Quality assurance

The EU-SILC survey applies the following quality management system procedures:

  • data validation based on assumptions that are part of the survey documentation, performed automatically in applications (procedures: control of the range of values for each variable, logical control, statistical testing procedures, e.g. techniques for the analysis of unusual observations, logical control with statistical control, comparisons with data from previous waves or from other sources);
  • after each survey, a publication is created (Income and living conditions of the population of Polish  - report from the EU-SILC) - containing a broad methodological description and the most important results of the survey with their precision;
  • after each survey, the results of income data are compared with the data from National Accounts and HBS;
  • in 2019, the quality review of the EU-SILC survey was carried out based on the checklist for self-assessment of the quality of public statistics surveys (LiKoS - based on the European DESAP Checklist and is a tool helpful in assessing the quality of statistics by the authors of the surveys. The list is fully compliant with the quality criteria of the European Statistical System and covers the main aspects relevant to the quality of the statistical data. The result of the review is a report and recommendations with a schedule for their implementation. Directions of changes for the coming years: reducing the burden on respondents; preparation of a national dataset for national users (which would enable earlier transfer of datasets to data users);
  • before each edition of the survey, substantive training for persons carrying out the EU-SILC survey (interviewers and field coordinators) is conducted by employees of the EU-SILC substantive team at the Central Statistical Office.
11.2. Quality management - assessment

The EU-SILC survey is well aligned with the methodology contained in DocSILC065 (Methodological guidelines and description of eu-silc target variables). This ensures high comparability of data at the European level.

During the project realised under the Action plan for EU-SILC improvements Objective 1: Regional dimension of the EU-SILC data at NUTS2 level, the survey sample was increased, which resulted in an improvement in the precision of not only the main indicators but also the data in general. The problem, however, are substitute (proxy) interviews, which are high in Poland (in recent years, it is about 27% of all individual interviews). Unfortunately, it is a compromise between obtaining data from a person close to the respondent and the lack of an interview. Any information coming from people in the respondent's household is a better solution than imputating all the data.

Concerns, however, are raised by proxy interviews in the case of questions about the assessment of various phenomena. Therefore, in Poland, proxy interviews are not allowed for some issues and additional weights are introduced. We are also looking for other solutions. We have introduced a self-administration (the paper form for people aged 16 and more is left for people who cannot be found or do not have time for the interviewer). Thanks to this measure, the interview rate in 2019 decreased by approx. 2 percentage points (to 25%). In 2024, the CAWI method was used for the first time in the entire country for an individual form. Unfortunately, this method was not introduced by interviewers very widely. The analysis of the situation conducted on the basis of voivodeship reports showed that this was due to interviewers' concerns about technical problems with the CAWI application and inability to obtain an interview from the respondent (fear of the respondent changing his or her mind regarding the choice of the CWAI method). Interestingly, interviewers who carried out (ine 2023) the pilot of the method in 2024 used this method more widely than other interviewers. This means that work (especially during training) is needed to convince interviewers to implement this method.
The conducted comparative analysis of the income results with other studies indicates high comparability in terms of income from contract work and benefits. To a lesser extent, income from self-employment and properity income. A project is planned under which data from administrative sources will be used to improve the quality of data with lower coverage.

Data from EU-SILC 2023 were transmitted to Eurostat in February 2024.


12. Relevance Top
12.1. Relevance - User Needs

The main users of EU-SILC statistical data are policy makers, research institutes, media, and students.

12.2. Relevance - User Satisfaction

Eurostat carried out an online general User Satisfaction Survey (USS) in the period between April and July 2019 to obtain a better knowledge about users, considering their needs and satisfaction with the services provided by Eurostat. The survey has shown that EU-SILC is of very high relevance for users. For the majority, both aggregates and micro-data were important or essential in their work irrespective of the purpose of their use. The use of the ad-hoc modules was less widespread than the use of the nucleus variables. Nevertheless, there was high interest to repeat these modules in order to have the possibility of comparing data over time. Users emphasized their strong need for more detailed micro-data, which is currently not possible. Under the new legal framework implemented from 2021, the NUTS 2 division will be available for the main indicators. Finally, users were satisfied with overall quality of the service delivered by Eurostat, which encompasses data quality and the supporting service provided to them.

For more information, please consult the User Satisfaction Survey.

12.3. Completeness

Yearly datasets contain all variables.

No optional modular variables introduced:

  • HY030G: Imputed rent (Optional)
  • RL080: Remote education (Optional)
  • HI130G: Interest expenses [not including interest expenses for purchasing the main dwelling] (OPTIONAL)
  • HI140G: Household debts (OPTIONAL)
12.3.1. Data completeness - rate

In 2024, Poland provided all variables in accordance with the D065 documentation: 100%


13. Accuracy Top
13.1. Accuracy - overall

According to Reg. (EU) 2019/1700 Annex II, precision requirements for all data sets are expressed in standard errors and are defined as continuous functions of the actual estimates and of the size of the statistical population in a country or in a NUTS 2 region. For the income and living conditions domain, the estimated standard errors of the following indicators are examined according to certain parameters set:

  • Ratio at‐risk‐of‐poverty or social exclusion to population;
  • Ratio of at‐persistent‐risk‐of‐poverty over four years to population;
  • Ratio at‐risk‐of‐poverty or social exclusion to population in each NUTS 2 region.

Further information is provided in section 13.2 Sampling error.

13.2. Sampling error

EU-SILC is a complex survey involving different sampling designs in different countries. In order to harmonize and make sampling errors comparable among countries, Eurostat (with the substantial methodological support of Net-SILC2) has chosen to apply the "linearization" technique coupled with the “ultimate cluster” approach for variance estimation.

Linearization is a technique based on the use of linear approximation to reduce non-linear statistics to a linear form, justified by asymptotic properties of the estimator. This technique can encompass a wide variety of indicators, including EU-SILC indicators. The "ultimate cluster" approach is a simplification consisting in calculating the variance taking into account only variation among Primary Sampling Unit (PSU) totals. This method requires first stage sampling fractions to be small which is nearly always the case. This method allows a great flexibility and simplifies the calculations of variances. It can also be generalized to calculate variance of the differences of one year to another.

The main hypothesis on which the calculations are based is that the "at risk of poverty" threshold is fixed. According to the characteristics and availability of data for different countries, we have used different variables to specify strata and cluster information. 

In particular, countries have been split into 3 groups:

1) BE, BG, CZ, IE, EL, ES, FR, HR, IT, LV, HU, PL, PT, RO, SI, UK and AL, whose sampling design could be assimilated to a two-stage stratified type we used DB050 (primary strata) for strata specification and DB060 (Primary Sampling Unit) for cluster specification;

2) DK, DE, EE, CY, LT, LU, NL, AT, SK, FI, CH whose sampling design could be assimilated to a one stage stratified type we used DB050 for strata specification and DB030 (household ID) for cluster specification;

3) MT, SE, IS, NO, whose sampling design could be assimilated to a simple random sampling, we used DB030 for cluster specification and no strata.

13.2.1. Sampling error - indicators

The concept of accuracy refers to the precision of estimates computed from a sample rather than from the entire population. Accuracy depends on sample size, sampling design effects and structure of the population under study. In addition to that, sampling errors and non-sampling errors need to be taken into account. Sampling error refers to the variability that occurs at random because of the use of a sample rather than a census and non-sampling errors are errors that occur in all phases of the data collection and production process.

Sampling errors of indicators for the quality report were estimated using ultimate cluster method and linearization. Calibration of the weights was also taken into account. The R package vardpoor was used in the calculations.



Annexes:
PL_2024_Annex 3-Sampling_errors_13.2.1
PL_2024_Annex A EU-SILC - content tables
13.3. Non-sampling error

Non-sampling errors are basically of 4 types:

  • Coverage errors: errors due to divergences existing between the target population and the sampling frame.
  • Measurement errors: errors that occur at the time of data collection. There are a number of sources for these errors such as the survey instrument, the information system, the interviewer and the mode of collection.
  • Processing errors: errors in post-data-collection processes such as data entry, keying, editing and weighting.
  • Non-response errors: errors due to an unsuccessful attempt to obtain the desired information from an eligible unit. Two main types of non-response errors are considered:
    • Unit non-response: refers to absence of information of the whole units (households and/or persons) selected into the sample.
    • Item non-response: refers to the situation where a sample unit has been successfully enumerated, but not all required information has been obtained.
13.3.1. Coverage error

Coverage errors include over-coverage, under-coverage and misclassification:

  • Over-coverage: relates either to wrongly classified units that are in fact out of scope, or to units that do not exist in practice.
  • Under-coverage: refers to units not included in the sampling frame.
  • Misclassification: refers to incorrect classification of units that belong to the target population
13.3.1.1. Over-coverage - rate

Coverage error

Main problems

Population (sub-population)

Size of error

Comments

Over-coverage

 new subsample

6.3% 

 

Under-coverage

 

 

information not available 

Misclassification

 

 

 information not available

13.3.1.2. Common units - proportion

The EU-SILC survey does not use data from administrative sources.

13.3.2. Measurement error

 

Measurement error for cross-sectional data

Cross-sectional data

Source of measurement errors

Building process of questionnaire 

Interview training

Quality control

 As with any other statistical survey, EU-SILC may be burdened with non-sampling errors which occur at various stages of the survey and which cannot be eliminated completely. This mainly applies to interviewers' errors at the stage of collecting the information, errors due to the respondents' misunderstanding of questions and inaccurate or sometimes even false answers as well as the errors taking place at the stage of data recording.

The guidence provided by Eurostat is compared to the questionnaire in order to check that the correct questions are being asked and the data being collected is relevant. Questions are always thoroughly translated into Polish and tested via peer review. 

The organisation and performance of the survey in the field was within the responsibility of regional statistical offices. Most of the interviewers were regular employees of local statistical offices, therefore had experience in conducting surveys. Conducting survey was preceeded by series of trainings. Firstly, regional survey coordinators were instructed by the Statistics Poland - 

Social Surveys and Labour Market Department staff members and afterwards the regional survey coordinators trained interviewers at the regional statistical offices. The interviewers received written instructions concerning the survey performance. 

 2016 was the first year of introducing mixed method of interviews and 3 stages of data control. Interviewers could choose between CAPI (direct recording) or PAPI questionnaire. After using PAPI data was recorded on CAPI. On tablets (CAPI) it was carried out the first stage of control. After the work was completed, the data were transmitted using Internet to the MS SQL server to the regional database with second stage of control – server application. The server application used by the staff of Statistical Offices recording the data directly for the national database and for those supervising the regional data preparation; this application was published in the CITRIX server and made accessible with the customer's software. the software application had a module which allowed for works (such as checking, viewing, making statements) on the national data (from all the voivodships). The national file completeness was also checked with the use of Microsoft Visual FoxPro. Additional check-ups were made with SAS checking programmes.

13.3.3. Non response error

Non-response errors are errors due to an unsuccessful attempt to obtain the desired information from an eligible unit. Two main types of non-response errors are considered:

1) Unit non-response which refers to the absence of information of the whole units (households and/or persons) selected into the sample. According to Annex VI of the Reg.(EU) 2019/2242

  • Household non-response rates (NRh) is computed as follows:

NRh=(1-(Ra * Rh)) * 100

Where Ra is the address contact rate defined as:

Ra= Number of address/selected person (including phone, mail if applicable) successfully contacted/Number of valid addresses/selected person (including phone, mail if applicable) selected

and Rh is the proportion of complete household interviews accepted for the database

Rh=Number of household interviews completed and accepted for database/Number of eligible households at contacted addresses (including phone, mail if applicable)

• Individual non-response rates (NRp) is computed as follows:

NRp=(1-(Rp)) * 100

Where Rp is the proportion of complete personal interviews within the households accepted for the database

Rp= Number of personal interview completed/Number of eligible individuals in the households whose interviews were completed and accepted for the database

• Overall individual non-response rates (*NRp) is computed as follows:

*NRp=(1-(Ra * Rh * Rp)) * 100

For those Members States where a sample of persons rather than a sample of households (addresses, phones, mails etc.) was selected, the individual non-response rates will be calculated for ‘the selected respondent.

2) Item non-response which refers to the situation where a sample unit has been successfully enumerated, but not all the required information has been obtained.

 

13.3.3.1. Unit non-response - rate

Unit non-response rate for cross-sectional

Address (including phone, mail if applicable) contact rate

Complete household interviews

Complete personal interviews

Household Non-response rate

Individual non-response rate

Overall individual non-response rate

(Ra)

(Rh)

(Rp)

(NRh)

(NRp)

(NRp)*

A

B

C

A

B

C

A

B

C

A

B

C

A

B

C

A

B

C

97.97

 94.99

 100.00

89.45 

 90.45

 91.76

83.37 

80.32 

80.32 

12.36 

14.08 

8.24 

16.63 

 19.68

19.68 

26.94 

31.00 

 26.30

where

A=total (cross-sectional) sample,

B =New sub-sample (new rotational group) introduced for first time in the survey this year,

C= Sub-sample (rotational group) surveyed for last time in the survey this year.

 

13.3.3.2. Item non-response - rate

The computation of item non-response is essential to fulfil the precision requirements. Item non-response rate is provided for the main income variables both at household and personal level.

Item non-response which refers to the situation where a sample unit has been successfully enumerated, but not all the required information has been obtained.

13.3.3.2.1. Item non-response rate by indicator

Calculations for Item non-response rate are included in the Annex.



Annexes:
PL_2024_Annex 2-Item_non_response_13.3.3.2.1
13.3.4. Processing error

 Description of data entry, coding controls and the editing system

Data entry and coding

(if any used)

Editing controls

 In 2024, emphasis was placed on conducting face-to-face interviews. The percentage of interviews conducted using this method was 62.8% (an increase of 11.3 percentage points compared to 2023). The rest were carried out using the CATI method. Most often, the interviews were recorded directly into the CAPI application. Two types of questionnaires were used: individual questionnaires and household questionnaires.  From 2016, a mixed method of interviews and 3 stages of data control have been introduced. Interviewers can choose between the CAPI questionnaire (direct recording) or the PAPI questionnaire. After using PAPI, the data is saved on CAPI. The first stage of control takes place on tablets (CAPI). After completion of the work, the data is sent via the Internet to the MS SQL server to the regional database with the second stage of control. The server application used by the substantive staff of the Statistical Offices had a module enabling work (including checking, browsing, giving testimony) on their national data (from all voivodeships). Before the files are sent to Eurostat, an additional check is carried out with the help of SAS checking programs.
13.3.5. Model assumption error

Not applicable as error modeling has not been applied.


14. Timeliness and punctuality Top
14.1. Timeliness

The EU-SILC content team has been working for several years to accelerate the publication of data compared to the reference period of the data.

14.1.1. Time lag - first result

Data sets for 2024 were sent to Eurostat in December 2024. Unfortunately, it was necessary to submit a correction to the data sets in February and March 2025.

 

14.1.2. Time lag - final result

Final data:

:

  • income data: 12 months
  • other data: 6 months
14.2. Punctuality

The first publication of EU-SILC 2024 data took place in January 2025.

14.2.1. Punctuality - delivery and publication

The publication "Incomes and living conditions of the population of Poland (report from the EU-SILC survey of 2024)" is scheduled to be published on the website of the Central Statistical Office on December 31, 2025. At the moment there is no risk of meeting this deadline.


15. Coherence and comparability Top
15.1. Comparability - geographical

In Poland, the same definitions and forms for the EU-SILC survey apply throughout the country.  The methodological instruction for the survey is also prepared centrally. Training for people carrying out the survey  is conducted by the EU-SILC substantive team. In case of doubt, voivodship coordinators consult directly with the members of the substantive team. All of this reduces the possibility of regional errors.

Any discrepancies in comparing data at the international level are limited to a minimum  by adapting the methodology according  to  the guidelines prepared by Eurostat. Any doubts are consulted with the Unit F-4: Income and living conditions - Quality of life team. In the case of income data, minor differences along with the level of comparability are described in section 3.4. Statistical concepts and definitions.

15.1.1. Asymmetry for mirror flow statistics - coefficient

Not applicable.

15.2. Comparability - over time

 See the annex on Break in series.

15.2.1. Length of comparable time series

Methodological changes affecting the comparability of data are described in the Annex



Annexes:
PL_2024_Annex 8-Breaks in series_15.2-updated
15.2.2. Comparability and deviation from definition for each income variable

Comparability and deviation from definition for each income variable

Income

Identifier

Comparability

Deviation from definition if any

Total hh gross income

(HY010)

 F

 

Total disposable hh income

(HY020)

 F

 

Total disposable hh income before social transfers other than old-age and survivors' benefits

(HY022)

 F

 

Total disposable hh income before all social transfers

(HY023)

 F

 

Income from rental of property or land

(HY040)

 F

 

Family/ Children related allowances

(HY050)

 L

 Assistance for foster families benefit has been qualified to the category of 'Family related allowances'.

Social exclusion payments not elsewhere classified

(HY060)

 F

 

Housing allowances

(HY070)

 F

 

Regular inter-hh cash transfers received

(HY080)

 F

 

Alimonies received

(HY081)

 F

 

Interest, dividends, profit from capital investments in incorporated businesses

(HY090)

 F

 

Interest paid on mortgage

(HY100)

 F

 

Income received by people aged under 16

(HY110)

 F

 

Regular taxes on wealth

(HY120)

 F

 

Taxes paid on ownership of household main dwelling

(HY121)

 F

 

Regular inter-hh transfers paid

(HY130)

 F

 

Alimonies paid

(HY131)

 F

 

Tax on income and social contributions

(HY140)

 F

 

Repayments/receipts for tax adjustment

(HY145)

 F

 

Value of goods produced for own consumption

(HY170)

F

 

Cash or near-cash employee income

(PY010)

 L

 This variable does not account for: - assistance for foster families; since granting the benefit is not connected to quitting the job, this benefit has been qualified to the category of "Family related allowances" (HY050) 

Other non-cash employee income

(PY020)

 F

 

Income from private use of company car

(PY021)

 F

 

Employers social insurance contributions

(PY030)

 F

 

Contributions to individual private pension plans

(PY035)

 F

 

Cash profits or losses from self-employment

(PY050)

 L

 The data on income from self-employment were collected in two different ways: the respondents were asked about the company’s costs and profits and also about the amount of money gained from self-employment which was allocated to the household’s expenditure. After a detailed analysis of data it was decided that the income from self-employment would be equal to the amount allocated to the household’s needs.

Pension from individual private plans

(PY080)

 F

 

Unemployment benefits

(PY090)

 F

 

Old-age benefits

(PY100)

 F

 

Survivors benefits

(PY110)

 L

 Death grants are not included in the income because the whole sum is used to cover the cost of the funeral

Sickness benefits

(PY120)

 L

 Sickness and childcare benefits are not included (a childcare benefit is granted to the working parent of a sick child), because they are paid by the employer and cannot be detached from the income from hired employment. Therefore, they are accounted for in the income from hired employment.

Disability benefits

(PY130)

 F

 

Education-related allowances

(PY140)

 F

 

F= Fully comparable; L= Largely comparable; P= Partly comparable and NC= Not collected.

 

 

15.3. Coherence - cross domain

The coherence of two or more statistical outputs refers to the degree to which the statistical processes, by which they were generated, used the same concepts and harmonised methods. A comparison with external sources for all income target variables and the number of persons who receive income from each ‘income component’ will be provided, where the Member States concerned consider such external data to be sufficiently reliable.

 

Comparison of EU-SILC and HBS results

 

The objective of this section is to compare HBS (Household Budget Survey) and EU-SILC results.

When comparing these two sources we must take into account the discrepancies. The differences are to great extent brought about by the methodological diversity. Here are the main diverging points:

  • Different reference periods for income variables – in HBS the reference period is 1 month and, following Eurostat’s recommendation, the annual income is the monthly income multiplied by 12, which in the case of irregular income, like that from farming, can bring about considerable distortions. In EU-SILC the reference period is a calendar year preceding the survey;
  • Different types of income are taken into account i.e. in HBS the information is collected both about the income in cash and in kind, while in EU-SILC – only about the income in cash (with a few exceptions), which may be important for the income from farming and social benefits other than retirement pay and pension. Moreover, EU-SILC does not take into account the so called lump sums which is the case in HBS;
  • Different way of data collection – in HBS the respondents make records in the so called diary. They have to determine the data sources themselves and do not have them listed in the diary. This may cause omissions. In EU-SILC each respondent is asked detailed questions. In EU-SILC all the income missing data are imputed, while there is no imputation in HBS;
  • Different way of sample selection – in HBS dwellings in which all the households refused to participate in the survey are replaced with new ones from the so called reserve list;
  • Slightly different weighting of results.

 

Comparison of selected income data in the Annex



Annexes:
Comparison of EU-SILC and HBS results_2024
15.3.1. Coherence - sub annual and annual statistics

Not applicable.

15.3.2. Coherence - National Accounts

Currently, we do not yet have data to compare the results between EU-SILC and RN. Availability of data from the RN probably in August 2025. The study will be supplemented.

15.4. Coherence - internal

In 2024, there were no lack of coherency in the collection.


16. Cost and Burden Top

Mean (average) interview duration per household =  31  minutes.

Mean (average) interview duration per person = 28 minutes.

Mean (average) interview duration for selected respondents (if applicable) =  minutes. - PL - Not applicable


17. Data revision Top
17.1. Data revision - policy

The most important results from the EU-SILC survey together with the methodological description are presented in the publication Income and living conditions of the population of Polish - report from the EU-SILC. The methodological description contains information about the changes that were introduced to the survey in relation to previous years.

In 2005-2024, there were no changes in the results after their publication. If this had happened, the following measures would have been applied:

  • Paper publications - errata attached;
  • Internet publications - introducing changes with information about the correction;
  • Databases - introducing changes with information about the correction;

In the case of making available data sets, the following is practiced:

informing persons ordering data sets during the process of agreeing the scope of the contract about methodological changes in the survey (within the ordered thematic scope) and about possible lack of comparability of some data resulting from these changes.

17.2. Data revision - practice

In 2005-2024, there were no changes in the results after their publication.

17.2.1. Data revision - average size

In 2024, there are no revisions to report for the statistical process.


18. Statistical processing Top

Detailed information concerning sampling frame, sampling design, sampling units, sampling size, weightings and mode of data collection can be found in this section (please see below). Such information is mainly used for the computation of the accuracy measures.

18.1. Source data

The new subsample for EU-SILC 2024 was selected in November 2023 from the sampling frame updated as of June 30, 2023.

 

18.1.1. Sampling Design

Sampling frame

The sample for EU-SILC 2024 consisted of four panel subsamples.  The samples for EU-SILC 2005 and for the next years were selected from the sampling frame based on the TERYT system, i.e. National Official Register of Territorial Division of the Country. Two kinds of primary sampling units (PSU) were distinguished in the sampling frame:

  •  about 186 000 CEA – census enumeration areas with about 82 dwellings each,
  •  about 35 000 ESD – enumeration statistical districts, with about 439 dwellings each.

The whole territory of Poland is divided into enumeration statistical districts and census enumeration areas.
In EU-SILC census enumeration areas are used as primary sampling units. The secondary sampling units are dwellings. For each census enumeration area a list of dwellings was made up to form the secondary sampling frame. All the households from the selected dwellings are supposed to enter the survey.

The TERYT system is updated annually with respect to the territorial division into statistical districts and census enumeration areas. The lists of dwellings, names of towns, villages and streets are updated. Other changes due to new construction, demolition of buildings and administrative division modifications are also introduced.

The new subsample for EU-SILC 2024 was selected in November 2023 from the sampling frame updated as of June 30, 2023.

Sample design

Type of sampling design

 A two-stage sampling scheme with different selection probabilities at the first stage was used. Primary sampling units (PSU) were enumeration census areas. At the second stage dwellings were selected. All the households from the selected dwellings were supposed to enter the survey. Prior to selection, primary sampling units were stratified.

Stratification and sub stratification criteria

 The strata were the voivodships (NUTS2) and within the voivodships primary sampling units were classified by class of locality. In urban areas census areas were grouped by size of town. Big cities formed independent strata, but in the five largest cities districts were treated as strata. In rural areas strata were represented by rural gminas (NUTS5) of a subregion (NUTS3) or of a few neighbouring powiats (NUTS4). Altogether, 211 strata were distinguished for  the first year of the survey; this amount in subsequent editions was subject to certain modifications resulting from changes in the administrative division.

Sample selection schemes

It was estimated that in the first year of the survey (2005) the sample should comprise about 24 000 dwellings. Proportional allocation of dwellings to particular strata was applied. In the following years, the allocation of newly drawn subsamples proportionally between voivodships was modified due to the necessity of obtaining reliable data (compliant with Eurostat recommendations) at the NUTS 2 level.  As a consequence, this allocation has approximately become proportional to the square root of the number of dwellings in the population.

 

The number of dwellings selected from a particular stratum (in every NUTS 2 level) was in proportion to the number of dwellings in the stratum. Furthermore, the number of the first-stage units selected from the strata was obtained by dividing the number of dwellings in the sample by the number of dwellings determined for a given class of locality to be selected from the first-stage unit. In towns with at least 100 000 inhabitants 3 dwellings per PSU were selected, in towns with 20-100 thousand inhabitants – 4 dwellings per PSU, in towns with less than 20 000 inhabitants – 5 dwellings per PSU, respectively. In rural areas 6 dwellings were selected from each PSU.

In the first year of the survey 5912 census areas and 24044 dwellings were selected for the sample. Census areas were selected according to the Hartley-Rao scheme. Prior to selection, census areas were put in random order for each stratum separately and then the determined number of PSUs was selected with probabilities proportionate to the number of dwellings. Then, from each of the selected census areas dwellings were selected using the simple random selection without replacement procedure.

The selected sample of primary sampling units was divided into four subsamples, equal in size. Starting from 2006 one of the subsamples is eliminated and replaced with a new one, selected independently as described above. In 2024 subsample 2 was replaced by subsample 6 consisting of 2606 census areas and 9201 dwellings.

In 2024, a sample of reserve dwellings was scheduled for the new sample (as in the previous years), which will allow to obtain, in subsequent editions of the survey, an increase in the number of completed surveys within regions (NUTS 2). The larger sample carried out at the level of NUTS2 classification results from the need to meet the precision requirements for selected indicators, which are analyzed by Eurostat [1]. After the analysis of historical data, it was assumed that in the class of locality "over 20 thousand. inhabitants ", 12 reserve dwellings will be drawn to each address from the main sample; for the class of locality "less than 20 thousand. inhabitants ", 10 reserve dwellings will be drawn; for the remaining class of rural areas a random selection of 6 reserve addresses was established.

In determining the size of the new subsample in the regions (NUTS 2 level), a mathematical model was used, which included the following elements:

•             limitations for standard errors of AROPE indicator (people at risk of poverty or social exclusion) from Eurostat regulation, which should be met in 2024 year

•             the model of dependence of the estimated value of standard errors of the AROPE indicator from the number of households with completed interviews in each region

•             historical data on the completeness rates for the subsamples surveyed in previous years

•             expected impact of the planned use of the reserve dwellings.

 

When drawing the new subsample 6 in 2024 year, the following additional elements were used to modify the sampling scheme used in previous years:

•             the strata for the first stage sampling units were defined by regions (NUTS 2), i.e., voivodships, taking into account the division of the Mazowieckie voivodship into two regions: the Warsaw Capital Region and the Mazowieckie Regional Region; then in regions by class of locality. Large cities generally constituted independent strata. In Warsaw, Krakow, Lodz, Poznan and Wroclaw, several strata each were created by combining neighbouring districts. Small cities and rural areas, on the other hand, were stratified by sub-region (NUTS 3) with consideration of classes of locality. In defining the strata in rural areas, account was taken of their diverse nature, as defined in the delimitation of rural areas (DOW) introduced by the Statistics Poland, which divides rural areas into 4 classes taking into account population density and distance to urban agglomerations. In addition, part of the “agricultural” strata, defined based on the percentage of dwellings with a user of an individual farm, was distinguished. In addition, specific “rich” strata were established based on the highest values of average tax income per capita of the municipality (according to PIT tax bases). A total of 238 strata were established, including 101 rural strata; 53 “agricultural” strata and 36 “rich” strata were created;

•             Income ranks were assigned to each dwelling address in the frame (the so-called Social Surveys Frame (OBS)) thanks to the Statistics Poland’s access to individual tax data from the Ministry of Finance, allowing for the identification of persons with the PESEL ID; This made it possible to assign information to OBS databases at the level of people and addresses. The provided administrative data processed by the Statistics Poland covered the years 2016-2019 and made it possible to obtain a total annual income for people. On the basis of unit tax data from PIT databases for 2019, a set was created in the OBS in which a code with a value from 1 to 10 was assigned to the apartment address identifier, i.e. a rank based on the deciles of the equivalent income distribution; the equivalent income was calculated by first summing up the total income from PIT for people assigned to a given address according to OBS and then dividing the total income by the square root of the number of people in the dwelling;

•             the allocation of the sample size between the strata determined earlier for each region was determined using the algorithm described in the article Wesołowski, Wieczorkowski (2017) [Wesolowski J., Wieczorkowski R. (2017), An eigenproblem approach to optimal equal-precision sample allocation in subpopulations, Communications in Statistics - Theory and Methods, 46: 5, 2212-2231.], Which solves the problem of optimal allocation in a two-stage sampling scheme that theoretically obtains minimal estimates of the relative standard error for the estimator of the mean value of a fixed feature; the new allocation algorithm requires the availability of a certain variable for each elementary sampling unit (i.e. a dwelling) in the frame; the selected variable should be well correlated with the key variables of the study; in the case of the EU-SILC survey, the 'income rank' feature described above was used; The new allocation algorithm also requires that the required ratio of the number of randomly drawn second-stage units to the number of first-stage units be specified as a parameter

  • such parameters were adopted for each region on the basis of data from previous years.

 

 

Sample distribution over time

 In the first year of the survey the selected sample of primary sampling units were divided into four subsamples, equal in size. Starting from 2006 one of the subsamples is eliminated and replaced with a new one, selected independently. 

Substitution

If the household from the selected dwelling refused to enter the survey substitution from reserve sample was applied (only for new subsample).  The survey from 2018 introduces the sorting of addresses from the reserve list due to the distance between the reserve address and address from main sample. This solution was introduced due to a decrease in the interviewer burden because of the travelling time between the addresses (in particular in rural areas) and travel costs in the case of the need for multiple visits at the same address (no contact with the respondent or completion of the interview).

 

Concerning the SILC instrument, three different sample size definitions can be applied:

  • the actual sample size which is the number of sampling units selected in the sample
  • the achieved sample size which is the number of observed sampling units (household or individual) with an accepted interview
  • the effective sample size which is defined as the achieved sample size divided by the design effect with regards to the at-risk-of poverty rate indicator

Given that the effective sample size has been already treated in the section dealing with sampling errors, in this section the attention focuses mainly on the achieved sample size.

 

In total 20 022 households were interviewed and included in the dataset.

40 329 persons at the age of 16 years and more completed an individual interview.

48 040 is the number of persons who are members of the households surveyed.

 

The following graphs show comparison of distributions of realized units from new subsample (DB075=2 and DB135=1) according to selected variable (available in the frame from administrative registers), by original and substituted dwellings.  Substituted units accounted for about 50 percent of all realized new subsample units.

Graphical analysis leads to a general conclusion that substituted sample did not make a significant difference compared to original sample. Proper calibration of weights (described in Annex 5) is an additional guarantee of the appropriate quality of estimation.

 

Fig.1. Comparison of distributions of realized units from new subsample according to number of employed persons, by original and substituted dwellings

 

 Number of employed people in the dwelling

 

Fig.2. Comparison of distributions of realized units from new subsample according to number of employed persons,  by original and substituted dwellings, for NUTS 2 regions

 

 Number of employed people in the dwelling

 

 


[1]  Annex II to the Regulation (EU) 2019/1700 of the European Parliament and of the Council establishing a common framework for European statistics relating to persons and households, based on data at individual level collected from samples.

18.1.2. Sampling unit

The first-stage sampling units (primary sampling units - PSUs) were enumeration census areas, while at the second stage dwellings were selected. All the households from the selected dwellings are supposed to enter the survey.

18.1.3. Sampling frame

The sample for EU-SILC 2024 consisted of four panel subsamples.  The samples for EU-SILC 2005 and for the next years were selected from the sampling frame based on the TERYT system, i.e. National Official Register of Territorial Division of the Country. Two kinds of primary sampling units (PSU) were distinguished in the sampling frame:

  • about 186 000 CEA – census enumeration areas with about 82 dwellings each,
  • about 35 000 ESD – enumeration statistical districts, with about 439 dwellings each.

The whole territory of Poland is divided into enumeration statistical districts and census enumeration areas.
In EU-SILC census enumeration areas are used as primary sampling units. The secondary sampling units are dwellings. For each census enumeration area a list of dwellings was made up to form the secondary sampling frame. All the households from the selected dwellings are supposed to enter the survey.

The TERYT system is updated annually with respect to the territorial division into statistical districts and census enumeration areas. The lists of dwellings, names of towns, villages and streets are updated. Other changes due to new construction, demolition of buildings and administrative division modifications are also introduced.

The new subsample for EU-SILC 2024 was selected in November 2023 from the sampling frame updated as of June 30, 2023.

18.2. Frequency of data collection

Data is collected once a year. In Poland, the EU-SILC survey was conducted throughout the country from April 22 to June 28, 2024.

18.3. Data collection

 Mode of data collection

 

 

1-PAPI

2-CAPI

3-CATI

4-CAWI

4-CAWI

Individual level

5-PAPI proxy

6-CAPI proxy

7-CATI proxy

8-CAWI proxy

9-other

% of total

3.4

36.1

20.2

0.0

0.9

0.6

13.0

10.1

0.0

16.6

100% are all PAPI or CAPI or CATI interviews 

 

Description of collecting income variables  
The source or procedure used for the collection of income variables The form (gross, net) in which income variables at component level have been obtained The method used for obtaining target variables in the required form
Income variables are collected based on information provided by the respondent.  Information on net income, contributions and taxes is collected from the respondent.   The two amounts (income and contributions / taxesare then) summed up to gross income. 


Annexes:
PL_2024_Annex 4-Data_collection_18.3
18.4. Data validation

Validation is based on prepared assumptions: scope and logical. The first stage of validation takes place during the interview, the next one during the data collection preparation. The assumptions are developed for both cross and panel data sets. Each signaled situation is analyzed and if it is considered a mistake, it is corrected based on information from the respondent. The number of individual errors is monitored. If an error occurs too often, the reason is analyzed. The reason may be:

  • too restrictive assumption then it is modified;
  • a misunderstood question by the respondents, then its content is modified orif this cannot be done, other steps are taken. During trainings for interviewers, attention is paid to these issues and additional explanations are introduced to the instructions;
  • discrepancies in panel data - in the case of panel data it is necessary to resolve the respondent. Historical data is entered in the CAPI application and panel data is compared already at this stage. The remaining ones are corrected after telephone arrangements with the respondent.
18.5. Data compilation

The research uses the following processes: data weighting and imputation of missing income data. These processes are described in detail in sections 3.5.1 and 3.5.2.

18.5.1. Imputation - rate

In the case of PL, the imputation and Item non-response rates have the same values (all item nonresponses occurred  have been imputed).  

Data and annex from point 13.3.3.2.1.

18.5.2. Weighting methods

Detailed description in the annex.



Annexes:
PL_2024_Annex 5-Weights
18.5.3. Estimation and imputation
Imputation procedure used Imputed rent Company car
Genaral information about Imputation procedure used, has been given under the table. In 2024, it was imputed - currently the variable occurs in the year in which the Housing module is implemented, this variable was imputed in 2024 (a short description below) The data (concerning the private use of company car)covers the estimated amount the respondent gained by using the company car for private purposes. In case of the missing value (the respondent was using the company car but did not estimate the amount gained) imputation is applied with the use of the hot-deck and regression imputation with simulated residuals methods.

The methodology of EU-SILC requires for the imputation of the missing income data. The complete file is obtained through the imputation of the missing data.

Imputation is a procedure aimed at ensuring the completeness of a data set by replacing the data which are missing due to the respondent’s refusal to give answers with values that are correct from the formal point of view (imputation values). The imputation values are received by the means of a formalised procedure (an algorithm) designed in such a way that the generated values reflect, as precisely as possible, the probable values of missing data in terms of information included in the data set.

There are several methods of income variable imputation. They can be classified as deterministic and stochastic methods. In the case of deterministic methods, for a particular set of data the selected method and the set of explanatory variables (imputation algorithm) clearly determine the imputation values for each record. In stochastic methods the imputation value is determined with the use of an error term and that is why with the same algorithm and the same data file, each realisation of the algorithm may give slightly different imputation values. Although the stochastic methods slightly increase estimator variance (introducing an additional random error component), they do not distort variance or original data distribution characteristics allowing for the correct estimation of random error. Deterministic imputation brings about variable variance reduction in the file and random error underestimation; it also distorts to a greater extent the correlation structure and variable distribution. In the income data imputation applied in the EU-SILC survey, the preferable methods are those which preserve the distribution characteristics (thus favouring the stochastic methods).

 

The following stochastic methods were used:

  •   Hot-deck method

It involves the replacement of missing data in a record with gaps (the recipient record) with the data collected from a different record (the donor record) randomly selected out of complete (from the point of view of imputed variable) records which meet the specified conditions for similarity with the recipient record.

Auxiliary qualitative categorising variables (explanatory variables) , used for grouping records, may be used in the hot-deck method. In this case, a random representative is selected out of the records showing adequate values of auxiliary variables. If it is not possible to find a donor with the equivalent values for all the auxiliary variables, the so called sequence approach is adopted. The categorising variables are ranked from the most to the least significant ones. If there are no donors, the categorisation is carried out with the subsequent explanatory variables being left out, starting from the least significant ones, so as to obtain a subset containing donors.

In the case of applying a quantitative categorising variable in the hot-deck method, a breakdown into deciles is used as a categorisation criterion.

 

  •   Regression imputation with randomly selected empirical residuals

Auxiliary variables are the explanatory variables of the regression model. The model takes either a linear or power exponential form. It is fitted on the basis of the records which are complete from the point of view of the imputed variable. The imputed value (or its logarithm in the case of transformed models) is a sum of the theoretical value derived from the model and a randomly selected model residual. The set of records, out of which the residual is selected, is restricted to those which are nearest to the record imputed for the theoretical value derived from the model.

 

Out of the deterministic methods the following were applied:

  •   Regression deterministic imputation (the theoretical value from the model is adopted as the imputation value),
  •   Deduction imputation (the imputation value is directly determined on the basis of the relationships between variables).

 

The application of stochastic regression imputation requires a model which describes well the formation of a variable with relatively small variance of an error term and good statistical qualities. With high variance of a random component, there is a danger of getting accidental values which are not typical of the correct part of the dataset. That is why in the cases where in accordance with the assumption referred to above, stochastic imputation is required, the hot-deck method is preferred to regression imputation. This is particularly justified when the number of records for imputation is rather low, or when the number of correct records is too small for a suitable model fitting.

Stochastic regression imputation is most commonly used for incomes from hired employment, when:

  • an important category of income is analysed, i.e. declared by a significant rate of respondents and, if present, having usually a significant share in the total household’s income,
  • this category can be successfully modelled with the use of the variables included in the questionnaire,
  •   there is a large (absolute) number of missing data, their percentage, however, being rather small; a large number of correct records makes it possible to design a well-fitted model.

It is also widely used for income categories other than income from hired work if income of a given person/household from the previous year is known. In such a case, the stochastic regression imputation is treated as the basic method, however, the hot-deck method is also applied when it is difficult to adjust an appropriate model.

In view of a relatively wide scope of applications of the stochastic regression imputation, an additional protection against possible effects of insufficient model adequacy was introduced. The residuals are not generated from the distribution of residuals for the whole sample, but they are selected from a restricted subset. Although in an ideal model residuals should be in the form of white noise, showing no trend whatsoever, in reality there may be some trends (systematic elements) retained in the distribution of residuals, which are not detected by the model, e.g. those related to non-linearity of relationships which cannot be removed by any known transformations. In such a case the use of residuals from a restricted range reduces the risk of generating values diverging from the real variable distribution by combining the theoretical value and the residual which would be utterly improbable (in combination with this theoretical value).

Deterministic imputation is applied where missing data concern less significant components of income variables (taxes, social and health insurance fees, additions, etc.) in the situation when the main component is known. In such cases deterministic regression imputation is usually applied. The conversion of a gross value into a net value and vice versa is performed by the use of the regression deterministic imputation method, if it proves necessary due to missing data. Deduction imputation is employed in rare cases of obvious relationships and can be treated as a supplementary stage of data editing.

The explanatory variables in the models and the grouping ones in the case of hot-deck method have been selected so as to represent the relationships which, according to logics and knowledge about the phenomena studied, should occur in the data set, taking into account the accessibility of potential variables in the questionnaire. The relationships have been tested on the file of correct data and in the majority of cases they proved to be significant. Some of the explanatory variables have been retained, even if their impact on the imputed variable has not been statistically confirmed, if they express an economically important relationship or provide a grouping condition (interpretation criterion) in the calculation algorithm for variables.

For the persons and households not surveyed in the previous year (a new sample, new household members, persons who could not be interviewed previously) or for those who did not gain a particular type of income in the previous year, explanatory variables derived from the current data file are applied. Wherever the same type of income is found in the data for the previous year, its value is treated as the main explanatory (categorizing) variable, both in the case of variables subjected to regression imputation and the hot-deck method. The current variables may be treated as additional explanatory variables.

Since the 2023 edition of EU-SILC, deductive imputation was introduced. It is used for benefits in the case of which the amount of the benefit can be determined with very high reliability on the basis of information provided by the respondent regarding the fact of receiving the benefit and other qualitative characteristics from the survey describing him/herself and his/her situation. In such cases, the amount obtained as a result of deductive imputation is not marked as imputed but as a value obtained from the study. This approach was developed as part of a project implemented with the support of EC funds under the grant agreement 101052514-2021-PL-ILC-SILC.

Imputed rent - estimated using regression model. The first step consists in the estimation of a hedonic price function according to which actual rents paid by tenants depend on the main characteristics of dwellings. In the second step, the imputed rent is calculated using the model for all households that do not pay rent at the market price. Monthly rent per 1 m sq. of the usable dwelling area is the variable of interest in the model. An exponential formula of model specification is used (estimation on logarithms).

 

18.6. Adjustment

Not applicable.

18.6.1. Seasonal adjustment

Not applicable.


19. Comment Top

No comment.



Annexes:
PL_2024_Annex A EU-SILC - content tables
PL_2024_Annex 9-Rolling module
PL_2024_Annex 8-Breaks in series_15.2-updated
HOUSEHOLD QUESTIONNAIRE_2024
PERSONAL QUESTIONNAIRE_2024


Related metadata Top


Annexes Top