6. Accuracy and reliability |
Top |
|
|
6.1. Accuracy - overall |
Main sources of error |
In a sample survey, such FSS 2016, two main kinds of errors could be encountered: sampling errors, and measurement (or response) error. The sampling error is especially high for minor variables for which the precision were not established in defining the sample size (since not included in Annex IV of Reg. 1166/2008). |
|
6.2. Sampling error |
Method used for estimation of relative standard errors (RSEs)
|
The survey estimates of totals for regional domains have been produced using a direct estimator where the final weight of each unit has been obtained adjusting sampling weight for non response by multiplying by the inverse of the response rate on each stratum (in few cases the factor has been computed collapsing two similar strata) and through calibration to include auxiliary information in order to achieve the consistency of sample estimates with respect to some known totals of the population. The auxiliary variables used by Regions are: Number of holdings, Total Area, UAA, Arable land, Permanent crops, Bovine animals (heads).The variance estimator for regional and national estimates is defined by formula 5.14 of Estevao, Hidiroglou and Särndal (1995) “Methodological Principles for a Generalized Estimation System at Statistics Canada”, Journal of Official Statistics, vol. 11, n.2, pp181-204 that is implemented in the software ReGenesees (available on Istat’s web site: https://www.istat.it/it/metodi-e-strumenti/metodi-e-strumenti-it/elaborazione/strumenti-di-elaborazione/regenesees) |
|
6.2.1. Sampling error - indicators |
1. Relative standard errors (RSEs) - in annex
2. Reasons for possible cases where precision requirements are applicable and estimated RSEs are above the thresholds
|
Registers on agricultural holdings are going better year by year. Nevertheless we can not exclude we are still experiencing quality problems on some variables. Moreover the reference year of the sampling frame is not the same of the survey. Even if this represents the best available sampling frame a possible consequence of using a non-updated list is to observe discrepancies between frame and survey data. This is expected, in particular, for some livestock variables (poultry). In some cases the sample was built to achieve precision on variables proxy of those of interest but available at unit level in the register. For example for pasture we considered the aggregation of all pasture, including rough grazings; CVs for this variable appear better than those required by Eurostat. The same can be stated for variables related to pigs categories for which we considered only the total of pigs. Moreover, some CVs higher than expected can be due to other factors that we are not able to clearly assess, such as locally (both geographically and in terms of variables) low response rate or the earthquake that affected some regions. |
Annexes: 6.2.1-1. Relative standard errors |
6.3. Non-sampling error |
Not available. |
6.3.1. Coverage error |
1. Under-coverage errors |
The under coverage can be roughly estimated on the basis of the new units arising from demerging observed in the sample survey. From the holdings belonging to the sample 2016 it turned out there were 129 holdings that led to new activities thus giving rise to 320 new holdings. The new holdings maintained the weight of the original holdings in order to preserve the total land surface. |
2. Over-coverage errors |
Data were corrected for over-coverage, units not belonging to the target population were disregarded and the weights corrected through calibration. |
2.1 Multiple listings |
Not available. |
3. Misclassification errors |
We considered the units that changed the geographical area (Nuts 2 Regions) and weights were corrected through calibration (52 units).
|
4. Contact errors |
Due to wrong contact data from the list it was not possible to reach about 2.7% of the holdings of the sample. For the rest of the holdings in the sample the address data and telephone number, if necessary, have been updated during the interview. |
5. Other relevant information, if any |
Not available. |
|
6.3.1.1. Over-coverage - rate |
Over-coverage - rate |
The over-coverage rate, computed as the proportion of units from the sample which do not belong to the target population to the overall sample size, depends on how the units not belonging to the population are defined. a) considering only the units out of scope (exclusively forestry holdings, only kitchen gardens, abandoned land and holdings with no agricultural scope) we obtain 1 658 units and an over-coverage rate of 4.7% b)adding to the previous ones the ceased units for splitting/incorporations we obtain 3 069 units and an over-coverage rate of 8.8%. c) considerng also the temporary inactive holdings we obtain 3 588 units and an over-coverage rate of 10.3%. |
|
6.3.1.2. Common units - proportion |
[Not requested] |
6.3.2. Measurement error |
Characteristics that caused high measurement errors
|
The non-sampling errors could seriously affect the reliability of final results, particularly in complex surveys such as those on agricultural topics that require a considerable effort of memory by the respondent and knowledge of the productive and socio-economic phenomena by the interviewer. To minimise such kinds of errors some metodology have been implemented during the data collection phase.
- Interview techniques: interviewers were strongly requested to pose the questions to the interviewee in a way to avoid personal interpretations;
- "Annotations" field of the questionnaire: it should include all information deemed relevant by the interviewer, which would help to validate and analyse collected data.This prevented questionnaires from being returned and/or avoided subsequent contacts with the interviewee to confirm/justify the information.
|
|
6.3.3. Non response error |
1. Unit non-response: reasons, analysis and treatment |
The reasons for unit non response were lack of contact, due to different causes: 1- absence of the holder (and of anyone else be able to answer to the interview); 2- wrong address; 3- refusal; 4- other (illness, judicial measures, etc.). Since the unit non-response rate was quite low, no non-response analysis has been carried out in order to verify the presence of any bias. The unit non-response has been corrected by re-weighting (according to the strata the non-respondents belong to). |
2. Item non-response: characteristics, reasons and treatment |
The electronic questionnaire did not allow skipping the most important questions. For the remaining questions (admitting no response) imputation has been performed, with the exception of irrigation method and water source missing in presence of irrigable area (2124 holdings). This choice depends on the fact that we considered we had little information to implement a donor method with good results. |
|
6.3.3.1. Unit non-response - rate |
Unit non-response - rate |
The percentage of non-responding holdings is 6.6%, as the rate of non-respondent units out of eligible units (according to Eurostat definition). With respect to the non-responding units with unknown eligibility status they have been treated the same way as the eligible units. |
|
6.3.3.2. Item non-response - rate |
Item non-response - rate |
Not computed |
|
6.3.4. Processing error |
1. Imputation methods |
As far as item non-response is concerned the imputation process consisted of different modules:
- selective editing via mixture model was applied in order to detect critical units with potentially influential errors which needed to be corrected interactively. To this aim an R package developed by Istat called SeleMix was used;
- a set of deterministic rules (if-then rules) and ad hoc procedures were developed in order to correct systematic errors. To this aim ad hoc SAS programs were developed;
- different methods, which include mean imputation, hot-deck donor imputation and k-nearest-neighbors imputation (KNN), were developed in order to solve some specific inconsistencies for a subset of qualitative and quantitative variables. For this purpose SAS and R programs were developed;
- automatic imputation according to Fellegi and Holt methodology was performed in order to solve problems of missing, invalid or inconsistent data for random errors. It was performed for categorical and quantitative variables separately.
|
2. Other sources of processing errors |
Non-sampling errors, including missing, invalid or inconsistencies data, which may occur during each stage of the survey process were corrected following the strategy of editing and imputation showed in item 6.3.4-1. Different methods were used: selective editing for those units which had in one or more variables potentially influential errors for estimates; deterministic rules to correct systematic errors; ad hoc methods for a subset of qualitative and quantitative variables; automatic editing for random errors. With regard to ad hoc methods of imputation the choice based on different factors: type of variable, type of error and availability of auxiliary information. |
3. Tools used and people/organisations authorised to make corrections |
Editing and imputation process was carried out by the division of methodologists responsible for the development of editing and imputation strategies in survey data of Istat
Selective editing was applied by means of SeleMix, an R package developed by Istat.
SAS and R programs were developed for implementing ad hoc imputation methods and deterministic rules for several variables both qualitative and quantitative.
Categorical variables related to farm labour force and affected by random errors were imputed using SCIA, a module implementing Fellegi and Holt methodology, which is part of an open source software named ConcordJava developed by Istat.
The agricultural land variables affected by random errors were imputed according to Fellegi and Holt methodology by using Banff Processor, which is an application developed by Statistics Canada.
|
|
6.3.4.1. Imputation - rate |
Imputation - rate |
Not available. |
|
6.3.5. Model assumption error |
[Not requested] |
6.4. Seasonal adjustment |
[Not requested] |
6.5. Data revision - policy |
Data revision - policy |
Preliminary data are not published. Only final estimates are disseminated. |
|
6.6. Data revision - practice |
Data revision - practice |
Preliminary data are not published. Only final estimates are disseminated. |
|
6.6.1. Data revision - average size |
[Not requested] |