Robustness Matters: Pre-Training Can Enhance the Performance of Encrypted Traffic Analysis

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Luming Yang;Lin Liu;Jun-Jie Huang;Jiangyong Shi;Shaojing Fu;Yongjun Wang;Jinshu Su
{"title":"Robustness Matters: Pre-Training Can Enhance the Performance of Encrypted Traffic Analysis","authors":"Luming Yang;Lin Liu;Jun-Jie Huang;Jiangyong Shi;Shaojing Fu;Yongjun Wang;Jinshu Su","doi":"10.1109/TIFS.2025.3613970","DOIUrl":null,"url":null,"abstract":"Models with large-scale parameters and pre-training have been leveraged for encrypted traffic analysis. However, existing researches primarily focused on accuracy, often overlooking the role of large-scale pre-trained parameters in enhancing robustness. While machine learning (ML) and deep learning (DL) models trained from scratch can achieve high accuracy, they exhibit limited robustness. When subjected to network noise in real-world, their identification results can fluctuate significantly, which is unacceptable. Unfortunately, current robustness evaluation methods neglect samples diversity and employ unreasonable noise settings. This field still lacks a reasonable quantitative description of models robustness. In this paper, we propose the PA-curve to display the distribution of sample’s correct-decision stability, which can simultaneously reflect the model’s accuracy and robustness. By calculating the area under the PA-curve, called PA-area, we enable the quantitative assessment of robustness for encrypted traffic analysis. Furthermore, we design a pre-trained model based on packet length sequence, and pre-trained it on TB-scale traffic. By fine-tuning on limited labeled training data, it can achieve downstream analysis tasks. We conduct experiments on five encrypted traffic datasets with different tasks. Besides accuracy, we analyzed the robustness of the pre-trained model and existing methods under common network disturbances, including packet loss, retransmission, and disorder. Experimental results demonstrated that, compared to ML-based and DL-based models trained from scratch, the pre-trained model can not only achieve high accuracy, but also exhibit greater resilience to network noise. The source code is available at <uri>https://github.com/Shangshu-LAB/BERT-ps</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"10588-10603"},"PeriodicalIF":8.0000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11177602/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Models with large-scale parameters and pre-training have been leveraged for encrypted traffic analysis. However, existing researches primarily focused on accuracy, often overlooking the role of large-scale pre-trained parameters in enhancing robustness. While machine learning (ML) and deep learning (DL) models trained from scratch can achieve high accuracy, they exhibit limited robustness. When subjected to network noise in real-world, their identification results can fluctuate significantly, which is unacceptable. Unfortunately, current robustness evaluation methods neglect samples diversity and employ unreasonable noise settings. This field still lacks a reasonable quantitative description of models robustness. In this paper, we propose the PA-curve to display the distribution of sample’s correct-decision stability, which can simultaneously reflect the model’s accuracy and robustness. By calculating the area under the PA-curve, called PA-area, we enable the quantitative assessment of robustness for encrypted traffic analysis. Furthermore, we design a pre-trained model based on packet length sequence, and pre-trained it on TB-scale traffic. By fine-tuning on limited labeled training data, it can achieve downstream analysis tasks. We conduct experiments on five encrypted traffic datasets with different tasks. Besides accuracy, we analyzed the robustness of the pre-trained model and existing methods under common network disturbances, including packet loss, retransmission, and disorder. Experimental results demonstrated that, compared to ML-based and DL-based models trained from scratch, the pre-trained model can not only achieve high accuracy, but also exhibit greater resilience to network noise. The source code is available at https://github.com/Shangshu-LAB/BERT-ps
鲁棒性问题:预训练可以提高加密流量分析的性能
具有大规模参数和预训练的模型已被用于加密流量分析。然而,现有的研究主要集中在准确性上,往往忽略了大规模预训练参数在增强鲁棒性方面的作用。虽然从头开始训练的机器学习(ML)和深度学习(DL)模型可以达到高精度,但它们表现出有限的鲁棒性。在现实世界中,当受到网络噪声的影响时,它们的识别结果会有很大的波动,这是不可接受的。不幸的是,目前的鲁棒性评估方法忽略了样本多样性,并采用了不合理的噪声设置。该领域仍然缺乏对模型稳健性的合理定量描述。在本文中,我们提出了pa -曲线来显示样本的正确决策稳定性的分布,它可以同时反映模型的准确性和鲁棒性。通过计算pa -曲线下的面积,称为pa -面积,我们可以对加密流量分析的稳健性进行定量评估。在此基础上,设计了基于数据包长度序列的预训练模型,并在tb级流量上进行了预训练。通过对有限的标记训练数据进行微调,可以完成下游分析任务。我们在五个不同任务的加密交通数据集上进行了实验。除了准确性之外,我们还分析了预训练模型和现有方法在常见网络干扰(包括丢包、重传和无序)下的鲁棒性。实验结果表明,与从头开始训练的基于ml和基于dl的模型相比,预训练模型不仅可以达到较高的准确率,而且对网络噪声具有更强的弹性。源代码可从https://github.com/Shangshu-LAB/BERT-ps获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信