We are doing science for policy
The Joint Research Centre (JRC) is the European Commission's science and knowledge service which employs scientists to carry out research in order to provide independent scientific advice and support to EU policy.
It has been proven in research literature that the analysis of encrypted traffic with statistical analysis and machine learning can reveal the type of activities performed by a user accessing the network, thus leading to privacy risks. In particular, different types of traffic (e.g., skype, web access) can be identified by extracting time based features and using them in a classifier. Such privacy attacks are asymmetric because a limited amount of resources (e.g., machine learning algorithms) can extract information from encrypted traffic generated by cryptographic systems implemented with a significant amount of resources. To mitigate privacy risks, studies in research literature have proposed a number of techniques, but in most cases only a single technique is applied, which can lead to limited effectiveness. This paper proposes a mitigation approach for privacy risks related to the analysis of encrypted traffic which is based on the integration of three main components: (1) A machine learning component which proactively analyzes the encrypted traffic in the network to identify potential privacy threats and evaluate the effectiveness of various mitigation techniques (e.g., obfuscation), (2) a policy based component where policies are used to enforce privacy mitigation solutions in the network and (3) a network node profile component based on the Manufacturer Usage Description (MUD) standard to enable changes in the network nodes in the cases where the first two components are not effective in mitigating the privacy risks. This paper describes the different components and how they interact in a potential deployment scenario. The approach is evaluated on the public dataset ISCXVPN2016 and the results show that the privacy threat can be mitigated significantly by removing completely the identification of specific types of traffic or by decreasing the probability of their identification as in the case of VOIP by 50%, Chat by 40% and Browsing by 33%, thus reducing significantly the privacy risk.