WEDoHTool: Word embedding based early identification of DoH tunnel tool traffic in dynamic network environments

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yang Miao , Xiaoyan Hu , Guang Cheng , Ruidong Li , Hua Wu , Yang Meng
{"title":"WEDoHTool: Word embedding based early identification of DoH tunnel tool traffic in dynamic network environments","authors":"Yang Miao ,&nbsp;Xiaoyan Hu ,&nbsp;Guang Cheng ,&nbsp;Ruidong Li ,&nbsp;Hua Wu ,&nbsp;Yang Meng","doi":"10.1016/j.cose.2025.104680","DOIUrl":null,"url":null,"abstract":"<div><div>DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"159 ","pages":"Article 104680"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003694","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

DNS over HTTPS (DoH) protocol encapsulates DNS plaintext using HTTPS to protect user privacy. However, attackers can exploit various DoH tunnel tools to hide malicious DNS activity or evade detection. Early and accurate DoH tunnel tool traffic identification is crucial to ensure network security and stability by taking targeted countermeasures. The existing research primarily relies on conventional machine learning or deep learning technologies to detect DoH or DoH tunnel traffic based on the statistical features of network flows. The feature extraction relies on expert experience and cannot be performed until network flows or time windows end, delaying the identification of DoH traffic. Besides, the existing methods primarily focus on stable network environments, and their performance likely degrades in dynamic network environments. Moreover, work has yet to be done on identifying specific DoH tunnel tool traffic for targeted defense. Early identification of specific DoH tunnel tools with similar traffic patterns in dynamic network environments is challenging. To address the above concerns, we propose WEDoHTool, an early and accurate DoH tunnel tool traffic identification method based on word embedding technology. WEDoHTool extracts the length sequence of initial TLS records with application data from several initial packets of each unidirectional flow for early identification. Then, it employs word2vec, a word embedding technology, to efficiently capture the stable and complex relationships and patterns within the sequence. Finally, it classifies the embedding vector from the word2vec with a two-stage identification module. Specifically, WEDoHTool filters out DoH traffic from heavy background traffic with a lightweight TextCNN and then identifies the specific DoH tools based on a Transformer encoder with the self-attention mechanism. Our experimental results on the combined dataset consisting of CIRA-CIC-DoHBrw-2020 and DoH-Tunnel-Traffic-HKD demonstrate the effectiveness and efficiency of our WEDoHTool in detecting DoH traffic and identifying specific DoH tunnel tools in dynamic network environments. It maintains accuracies of at least 98.82% and 98.07% in dynamic networks at the two stages, respectively.
WEDoHTool:基于词嵌入的动态网络环境下DoH隧道工具流量的早期识别
DoH (DNS over HTTPS)协议通过HTTPS封装DNS明文,保护用户隐私。然而,攻击者可以利用各种DoH隧道工具来隐藏恶意DNS活动或逃避检测。早期、准确的DoH隧道工具流量识别是保障网络安全稳定、有针对性地采取应对措施的关键。现有的研究主要依靠传统的机器学习或深度学习技术,基于网络流的统计特征来检测DoH或DoH隧道流量。特征提取依赖于专家经验,直到网络流量或时间窗口结束才能进行,从而延迟了DoH流量的识别。此外,现有的方法主要针对稳定的网络环境,在动态的网络环境下,其性能可能会下降。此外,在确定特定的DoH隧道工具流量以进行目标防御方面,工作尚未完成。在动态网络环境中,早期识别具有相似流量模式的特定DoH隧道工具具有挑战性。针对上述问题,我们提出了一种基于词嵌入技术的早期、准确的DoH隧道工具流量识别方法WEDoHTool。WEDoHTool从每个单向流的多个初始数据包中提取包含应用数据的初始TLS记录的长度序列,用于早期识别。然后,利用词嵌入技术word2vec,高效捕获序列中稳定而复杂的关系和模式。最后,利用两阶段识别模块对word2vec中的嵌入向量进行分类。具体来说,WEDoHTool使用轻量级TextCNN从繁重的后台流量中过滤出DoH流量,然后基于具有自关注机制的Transformer编码器识别特定的DoH工具。我们在由CIRA-CIC-DoHBrw-2020和DoH- tunnel - traffic - hkd组成的组合数据集上的实验结果证明了我们的WEDoHTool在动态网络环境中检测DoH流量和识别特定DoH隧道工具方面的有效性和效率。在两阶段的动态网络中分别保持了至少98.82%和98.07%的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Security
Computers & Security 工程技术-计算机:信息系统
CiteScore
12.40
自引率
7.10%
发文量
365
审稿时长
10.7 months
期刊介绍: Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信