Yang Miao , Xiaoyan Hu , Guang Cheng , Ruidong Li , Hua Wu , Yang Meng
{"title":"WEDoHTool:基于词嵌入的动态网络环境下DoH隧道工具流量的早期识别","authors":"Yang Miao , Xiaoyan Hu , Guang Cheng , Ruidong Li , Hua Wu , Yang Meng","doi":"10.1016/j.cose.2025.104680","DOIUrl":null,"url":null,"abstract":"<div><div>DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"159 ","pages":"Article 104680"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WEDoHTool: Word embedding based early identification of DoH tunnel tool traffic in dynamic network environments\",\"authors\":\"Yang Miao , Xiaoyan Hu , Guang Cheng , Ruidong Li , Hua Wu , Yang Meng\",\"doi\":\"10.1016/j.cose.2025.104680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"159 \",\"pages\":\"Article 104680\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825003694\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003694","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
DoH (DNS over HTTPS)协议通过HTTPS封装DNS明文,保护用户隐私。然而,攻击者可以利用各种DoH隧道工具来隐藏恶意DNS活动或逃避检测。早期、准确的DoH隧道工具流量识别是保障网络安全稳定、有针对性地采取应对措施的关键。现有的研究主要依靠传统的机器学习或深度学习技术,基于网络流的统计特征来检测DoH或DoH隧道流量。特征提取依赖于专家经验,直到网络流量或时间窗口结束才能进行,从而延迟了DoH流量的识别。此外,现有的方法主要针对稳定的网络环境,在动态的网络环境下,其性能可能会下降。此外,在确定特定的DoH隧道工具流量以进行目标防御方面,工作尚未完成。在动态网络环境中,早期识别具有相似流量模式的特定DoH隧道工具具有挑战性。针对上述问题,我们提出了一种基于词嵌入技术的早期、准确的DoH隧道工具流量识别方法WEDoHTool。WEDoHTool从每个单向流的多个初始数据包中提取包含应用数据的初始TLS记录的长度序列,用于早期识别。然后,利用词嵌入技术word2vec,高效捕获序列中稳定而复杂的关系和模式。最后,利用两阶段识别模块对word2vec中的嵌入向量进行分类。具体来说,WEDoHTool使用轻量级TextCNN从繁重的后台流量中过滤出DoH流量,然后基于具有自关注机制的Transformer编码器识别特定的DoH工具。我们在由CIRA-CIC-DoHBrw-2020和DoH- tunnel - traffic - hkd组成的组合数据集上的实验结果证明了我们的WEDoHTool在动态网络环境中检测DoH流量和识别特定DoH隧道工具方面的有效性和效率。在两阶段的动态网络中分别保持了至少98.82%和98.07%的精度。
WEDoHTool: Word embedding based early identification of DoH tunnel tool traffic in dynamic network environments
DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.