Business preparations for the implementation of validation services

Beyond standard SDMX compliance, the configuration of the validation services is largely business defined with requirements formulated by statistical production domain managers. The following needs to be provided to technical support for a successful service deployment.

  1. Input definitions

    1. Identify data sets and data flows in scope. DSD availability for structural validation is implied by implemented SDMX standard (relevant code lists, concepts, cube regions and keysets identified in process).

    2. Define validation rules. Rules to be observed need to be defined on a dataset level. Definition of interdependencies between rules affecting the sequence of execution are possible but not mandatory, and service design accommodates such business decisions.

    3. Define validation rule sets. A minimum of one rule set needs to be defined for each dataset. Validation operations are segmented into four categories (see Section 2.1), presenting a structure of logical dependencies. Execution sequence of categories is as follows:

      • Basic logical checks

      • Basic content checks

      • General plausibility and consistency checks (within file)

      • Advanced plausibility and consistency checks (across files)

      This logical sequence must be followed, however categories may be merged or rules regrouped into a different number of rule sets, as long as the ultimate sequence definition is not violated and logical coherence is maintained. A rule set prepared for validating a specific dataset does not have to include all rules defined under a rule category.

    4. Definition of auxiliary datasets (if any) to be called for specific operations (e.g. validations run against databases held by external entities). As of Q4 2016, this is not in scope.

    5. Naming convention for input files. The default procedure follows the existing naming convention in Edamis.

  2. Process control parameter definitions
    Process control parameters are business defined process control rules and decisions. They drive the logic regarding how the process should operate in specific scenarios. The following are required:

    1. Severity definitions

      The severity level of failures needs to be defined for validation rules individually. In case the severity of a specific rule can be overridden by the appropriate authority (domain manager), this fact needs to be indicated.

    2. Asynchronous run time constraints

      A time expectation needs to be set by the business regarding the accepted service response time.

      Technological constraints exist regarding the minimum processing time and expectations cannot be defined below this threshold. The minimum processing time for individual datasets are communicated to domain managers by the technical support units.

    3. Definition if a specific dataset is to be transferred for further processing on completion of the validation process. Business requirements may dictate that specific flows are closed without file transfer.

  3. Output definitions

    1. Define the audience for each validation report type and instance. The same stakeholder needs to be defined for each dataflow output for security and confidentiality reasons.

    2. Preferred structure of outputs

      The content validation report consist of validation results and process metadata which may be presented separately or as one consolidated report, as per operational preference. In case of multiple rulesets executed, a separate report is produced for each validation job instance. STRUVAL and CONVAL results are presented separately.

    3. Preferred content of outputs

      The process manager does not have any control over the granularity of the report. What is sent back by STRUVAL/CONVAL is what is delivered to the data provider.

    4. Naming conventions for output files

      Naming of output files may be configured freely and is defined by the production domain managers. The existing EDIT based naming convention must be kept.

  4. Definition of users and privilege groups - Security and access

    Users are defined (Eurostat and external).

    Functional role based access to resources is currently not applicable, role based differentiation of access rights is defined in organizational context only (Eurostat internal users vs. external users). Future service configuration, however, may account for segmentation of privilege groups.

  5. Identify acceptance testing resources and test input files.