Luming Yang;Lin Liu;Jun-Jie Huang;Jiangyong Shi;Shaojing Fu;Yongjun Wang;Jinshu Su
{"title":"鲁棒性问题:预训练可以提高加密流量分析的性能","authors":"Luming Yang;Lin Liu;Jun-Jie Huang;Jiangyong Shi;Shaojing Fu;Yongjun Wang;Jinshu Su","doi":"10.1109/TIFS.2025.3613970","DOIUrl":null,"url":null,"abstract":"Models with large-scale parameters and pre-training have been leveraged for encrypted traffic analysis. However, existing researches primarily focused on accuracy, often overlooking the role of large-scale pre-trained parameters in enhancing robustness. While machine learning (ML) and deep learning (DL) models trained from scratch can achieve high accuracy, they exhibit limited robustness. When subjected to network noise in real-world, their identification results can fluctuate significantly, which is unacceptable. Unfortunately, current robustness evaluation methods neglect samples diversity and employ unreasonable noise settings. This field still lacks a reasonable quantitative description of models robustness. In this paper, we propose the PA-curve to display the distribution of sample’s correct-decision stability, which can simultaneously reflect the model’s accuracy and robustness. By calculating the area under the PA-curve, called PA-area, we enable the quantitative assessment of robustness for encrypted traffic analysis. Furthermore, we design a pre-trained model based on packet length sequence, and pre-trained it on TB-scale traffic. By fine-tuning on limited labeled training data, it can achieve downstream analysis tasks. We conduct experiments on five encrypted traffic datasets with different tasks. Besides accuracy, we analyzed the robustness of the pre-trained model and existing methods under common network disturbances, including packet loss, retransmission, and disorder. Experimental results demonstrated that, compared to ML-based and DL-based models trained from scratch, the pre-trained model can not only achieve high accuracy, but also exhibit greater resilience to network noise. The source code is available at <uri>https://github.com/Shangshu-LAB/BERT-ps</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"10588-10603"},"PeriodicalIF":8.0000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robustness Matters: Pre-Training Can Enhance the Performance of Encrypted Traffic Analysis\",\"authors\":\"Luming Yang;Lin Liu;Jun-Jie Huang;Jiangyong Shi;Shaojing Fu;Yongjun Wang;Jinshu Su\",\"doi\":\"10.1109/TIFS.2025.3613970\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Models with large-scale parameters and pre-training have been leveraged for encrypted traffic analysis. However, existing researches primarily focused on accuracy, often overlooking the role of large-scale pre-trained parameters in enhancing robustness. While machine learning (ML) and deep learning (DL) models trained from scratch can achieve high accuracy, they exhibit limited robustness. When subjected to network noise in real-world, their identification results can fluctuate significantly, which is unacceptable. Unfortunately, current robustness evaluation methods neglect samples diversity and employ unreasonable noise settings. This field still lacks a reasonable quantitative description of models robustness. In this paper, we propose the PA-curve to display the distribution of sample’s correct-decision stability, which can simultaneously reflect the model’s accuracy and robustness. By calculating the area under the PA-curve, called PA-area, we enable the quantitative assessment of robustness for encrypted traffic analysis. Furthermore, we design a pre-trained model based on packet length sequence, and pre-trained it on TB-scale traffic. By fine-tuning on limited labeled training data, it can achieve downstream analysis tasks. We conduct experiments on five encrypted traffic datasets with different tasks. Besides accuracy, we analyzed the robustness of the pre-trained model and existing methods under common network disturbances, including packet loss, retransmission, and disorder. Experimental results demonstrated that, compared to ML-based and DL-based models trained from scratch, the pre-trained model can not only achieve high accuracy, but also exhibit greater resilience to network noise. The source code is available at <uri>https://github.com/Shangshu-LAB/BERT-ps</uri>\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"10588-10603\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11177602/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11177602/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Robustness Matters: Pre-Training Can Enhance the Performance of Encrypted Traffic Analysis
Models with large-scale parameters and pre-training have been leveraged for encrypted traffic analysis. However, existing researches primarily focused on accuracy, often overlooking the role of large-scale pre-trained parameters in enhancing robustness. While machine learning (ML) and deep learning (DL) models trained from scratch can achieve high accuracy, they exhibit limited robustness. When subjected to network noise in real-world, their identification results can fluctuate significantly, which is unacceptable. Unfortunately, current robustness evaluation methods neglect samples diversity and employ unreasonable noise settings. This field still lacks a reasonable quantitative description of models robustness. In this paper, we propose the PA-curve to display the distribution of sample’s correct-decision stability, which can simultaneously reflect the model’s accuracy and robustness. By calculating the area under the PA-curve, called PA-area, we enable the quantitative assessment of robustness for encrypted traffic analysis. Furthermore, we design a pre-trained model based on packet length sequence, and pre-trained it on TB-scale traffic. By fine-tuning on limited labeled training data, it can achieve downstream analysis tasks. We conduct experiments on five encrypted traffic datasets with different tasks. Besides accuracy, we analyzed the robustness of the pre-trained model and existing methods under common network disturbances, including packet loss, retransmission, and disorder. Experimental results demonstrated that, compared to ML-based and DL-based models trained from scratch, the pre-trained model can not only achieve high accuracy, but also exhibit greater resilience to network noise. The source code is available at https://github.com/Shangshu-LAB/BERT-ps
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features