Yang Miao , Xiaoyan Hu , Guang Cheng , Ruidong Li , Hua Wu , Yang Meng
{"title":"WEDoHTool: Word embedding based early identification of DoH tunnel tool traffic in dynamic network environments","authors":"Yang Miao , Xiaoyan Hu , Guang Cheng , Ruidong Li , Hua Wu , Yang Meng","doi":"10.1016/j.cose.2025.104680","DOIUrl":null,"url":null,"abstract":"<div><div>DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"159 ","pages":"Article 104680"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003694","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.