{"title":"EncryptoVision:一种基于双模态融合的多分类加密流量识别模型","authors":"Zhiyuan Li , Yujie Jin","doi":"10.1016/j.comnet.2025.111499","DOIUrl":null,"url":null,"abstract":"<div><div>With the development of security, confidentiality, and data privacy technologies, the classification of fine-grained encrypted traffic has become increasingly important. Nowadays, existing deep learning methods, including CNN, LSTM, and transformer, have shown impressive classification performance. However, many of these methods merely utilize the raw packet bytes to generate traffic representations, resulting in the potential loss of crucial information, such as dynamic traffic patterns and changes in protocols. In this paper, we propose a dual-modal fusion-based multi-classification model for encrypted traffic recognition, called EncryptoVision. Firstly, we transform the encrypted traffic data into three-channel images and incorporate a triplet attention mechanism to enhance the interaction among the three channels. Then, we use the multi-head self-attention mechanism to expand the model’s global receptive field, allowing it to capture more detailed spatial feature information. Additionally, we also leverage the learning abilities of the transformer encoder to extract temporal feature information from the traffic for long-term time series prediction. Next, we use the spatial–temporal fusion features to obtain the fine-grained features for multi-classification. Experimental results show that our model outperforms state-of-the-art models in classification performance across four real-world encrypted traffic datasets.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"270 ","pages":"Article 111499"},"PeriodicalIF":4.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EncryptoVision: A dual-modal fusion-based multi-classification model for encrypted traffic recognition\",\"authors\":\"Zhiyuan Li , Yujie Jin\",\"doi\":\"10.1016/j.comnet.2025.111499\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the development of security, confidentiality, and data privacy technologies, the classification of fine-grained encrypted traffic has become increasingly important. Nowadays, existing deep learning methods, including CNN, LSTM, and transformer, have shown impressive classification performance. However, many of these methods merely utilize the raw packet bytes to generate traffic representations, resulting in the potential loss of crucial information, such as dynamic traffic patterns and changes in protocols. In this paper, we propose a dual-modal fusion-based multi-classification model for encrypted traffic recognition, called EncryptoVision. Firstly, we transform the encrypted traffic data into three-channel images and incorporate a triplet attention mechanism to enhance the interaction among the three channels. Then, we use the multi-head self-attention mechanism to expand the model’s global receptive field, allowing it to capture more detailed spatial feature information. Additionally, we also leverage the learning abilities of the transformer encoder to extract temporal feature information from the traffic for long-term time series prediction. Next, we use the spatial–temporal fusion features to obtain the fine-grained features for multi-classification. Experimental results show that our model outperforms state-of-the-art models in classification performance across four real-world encrypted traffic datasets.</div></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":\"270 \",\"pages\":\"Article 111499\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128625004669\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625004669","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
EncryptoVision: A dual-modal fusion-based multi-classification model for encrypted traffic recognition
With the development of security, confidentiality, and data privacy technologies, the classification of fine-grained encrypted traffic has become increasingly important. Nowadays, existing deep learning methods, including CNN, LSTM, and transformer, have shown impressive classification performance. However, many of these methods merely utilize the raw packet bytes to generate traffic representations, resulting in the potential loss of crucial information, such as dynamic traffic patterns and changes in protocols. In this paper, we propose a dual-modal fusion-based multi-classification model for encrypted traffic recognition, called EncryptoVision. Firstly, we transform the encrypted traffic data into three-channel images and incorporate a triplet attention mechanism to enhance the interaction among the three channels. Then, we use the multi-head self-attention mechanism to expand the model’s global receptive field, allowing it to capture more detailed spatial feature information. Additionally, we also leverage the learning abilities of the transformer encoder to extract temporal feature information from the traffic for long-term time series prediction. Next, we use the spatial–temporal fusion features to obtain the fine-grained features for multi-classification. Experimental results show that our model outperforms state-of-the-art models in classification performance across four real-world encrypted traffic datasets.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.