{"title":"DoH Insight: detecting DNS over HTTPS by machine learning","authors":"Dmitrii Vekshin, Karel Hynek, T. Čejka","doi":"10.1145/3407023.3409192","DOIUrl":null,"url":null,"abstract":"Over the past few years, a new protocol DNS over HTTPS (DoH) has been created to improve users' privacy on the internet. DoH can be used instead of traditional DNS for domain name translation with encryption as a benefit. This new feature also brings some threats because various security tools depend on readable information from DNS to identify, e.g., malware, botnet communication, and data exfiltration. Therefore, this paper focuses on the possibilities of encrypted traffic analysis, especially on the accurate recognition of DoH. The aim is to evaluate what information (if any) can be gained from HTTPS extended IP flow data using machine learning. We evaluated five popular ML methods to find the best DoH classifiers. The experiments show that the accuracy of DoH recognition is over 99.9 %. Additionally, it is also possible to identify the application that was used for DoH communication, since we have discovered (using created datasets) significant differences in the behavior of Firefox, Chrome, and cloudflared. Our trained classifier can distinguish between DoH clients with the 99.9 % accuracy.","PeriodicalId":121225,"journal":{"name":"Proceedings of the 15th International Conference on Availability, Reliability and Security","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407023.3409192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44
Abstract
Over the past few years, a new protocol DNS over HTTPS (DoH) has been created to improve users' privacy on the internet. DoH can be used instead of traditional DNS for domain name translation with encryption as a benefit. This new feature also brings some threats because various security tools depend on readable information from DNS to identify, e.g., malware, botnet communication, and data exfiltration. Therefore, this paper focuses on the possibilities of encrypted traffic analysis, especially on the accurate recognition of DoH. The aim is to evaluate what information (if any) can be gained from HTTPS extended IP flow data using machine learning. We evaluated five popular ML methods to find the best DoH classifiers. The experiments show that the accuracy of DoH recognition is over 99.9 %. Additionally, it is also possible to identify the application that was used for DoH communication, since we have discovered (using created datasets) significant differences in the behavior of Firefox, Chrome, and cloudflared. Our trained classifier can distinguish between DoH clients with the 99.9 % accuracy.
在过去的几年里,一种新的DNS Over HTTPS (DoH)协议已经被创建,以改善用户在互联网上的隐私。DoH可以代替传统的DNS进行域名转换,并具有加密的优点。这个新功能也带来了一些威胁,因为各种安全工具依赖于DNS的可读信息来识别,例如恶意软件,僵尸网络通信和数据泄露。因此,本文重点研究加密流量分析的可能性,特别是对DoH的准确识别。目的是评估使用机器学习可以从HTTPS扩展IP流数据中获得哪些信息(如果有的话)。我们评估了五种流行的ML方法来找到最好的DoH分类器。实验表明,DoH识别的正确率达到99.9%以上。此外,还可以识别用于DoH通信的应用程序,因为我们已经发现(使用创建的数据集)Firefox、Chrome和cloudflare的行为存在显著差异。我们训练的分类器能够以99.9%的准确率区分DoH客户端。