Cheng Fang , Yingkun Liu , Shenglin Teng , Mingrui Yin , Tao Han
{"title":"CrossModal-CLIP:一种用于鲁棒网络流量异常检测的新型多模态对比学习框架","authors":"Cheng Fang , Yingkun Liu , Shenglin Teng , Mingrui Yin , Tao Han","doi":"10.1016/j.comnet.2025.111723","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid proliferation of Internet-connected devices has amplified online activities but also escalated the complexity of network threats. Traditional methods relying on statistical and raw byte-based analysis often inadequately capture comprehensive behaviors of network traffic, leading to potential information loss. In this article, a novel method for network anomaly detection using cross-modal contrastive learning is proposed. By effectively fusing intermediate “multimodal” representation of traffic data-byte grayscale images and statistical sequences-via contrastive learning, our method enhances the robustness of traffic representation. Using a cross-modal Transformer encoder for fusion strengthens this representation, addressing the limitations of traditional methods. In contrastive learning, a dynamically increasing temperature coefficient is designed to adjust the pre-training model. Additionally, leveraging self-supervised contrastive learning reduces reliance on labeled samples while enhancing feature extraction capabilities. Extensive experiments on multiple real datasets validate the effectiveness of our method, demonstrating excellent performance with significant improvements in recall and precision compared to existing approaches. In addition, by the invariance mechanism of contrastive learning across scenarios, we have also applied the pre-trained model to the encrypted environment to explore the generalization performance.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"272 ","pages":"Article 111723"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CrossModal-CLIP: A novel multimodal contrastive learning framework for robust network traffic anomaly detection\",\"authors\":\"Cheng Fang , Yingkun Liu , Shenglin Teng , Mingrui Yin , Tao Han\",\"doi\":\"10.1016/j.comnet.2025.111723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid proliferation of Internet-connected devices has amplified online activities but also escalated the complexity of network threats. Traditional methods relying on statistical and raw byte-based analysis often inadequately capture comprehensive behaviors of network traffic, leading to potential information loss. In this article, a novel method for network anomaly detection using cross-modal contrastive learning is proposed. By effectively fusing intermediate “multimodal” representation of traffic data-byte grayscale images and statistical sequences-via contrastive learning, our method enhances the robustness of traffic representation. Using a cross-modal Transformer encoder for fusion strengthens this representation, addressing the limitations of traditional methods. In contrastive learning, a dynamically increasing temperature coefficient is designed to adjust the pre-training model. Additionally, leveraging self-supervised contrastive learning reduces reliance on labeled samples while enhancing feature extraction capabilities. Extensive experiments on multiple real datasets validate the effectiveness of our method, demonstrating excellent performance with significant improvements in recall and precision compared to existing approaches. In addition, by the invariance mechanism of contrastive learning across scenarios, we have also applied the pre-trained model to the encrypted environment to explore the generalization performance.</div></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":\"272 \",\"pages\":\"Article 111723\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128625006899\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625006899","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
CrossModal-CLIP: A novel multimodal contrastive learning framework for robust network traffic anomaly detection
The rapid proliferation of Internet-connected devices has amplified online activities but also escalated the complexity of network threats. Traditional methods relying on statistical and raw byte-based analysis often inadequately capture comprehensive behaviors of network traffic, leading to potential information loss. In this article, a novel method for network anomaly detection using cross-modal contrastive learning is proposed. By effectively fusing intermediate “multimodal” representation of traffic data-byte grayscale images and statistical sequences-via contrastive learning, our method enhances the robustness of traffic representation. Using a cross-modal Transformer encoder for fusion strengthens this representation, addressing the limitations of traditional methods. In contrastive learning, a dynamically increasing temperature coefficient is designed to adjust the pre-training model. Additionally, leveraging self-supervised contrastive learning reduces reliance on labeled samples while enhancing feature extraction capabilities. Extensive experiments on multiple real datasets validate the effectiveness of our method, demonstrating excellent performance with significant improvements in recall and precision compared to existing approaches. In addition, by the invariance mechanism of contrastive learning across scenarios, we have also applied the pre-trained model to the encrypted environment to explore the generalization performance.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.