{"title":"ResDNViT: A hybrid architecture for Netflow-based attack detection using a residual dense network and Vision Transformer","authors":"Hassan Wasswa, Hussein A. Abbass, Timothy Lynar","doi":"10.1016/j.eswa.2025.127504","DOIUrl":null,"url":null,"abstract":"<div><div>The fast evolution of technologies like wireless sensor networks, cloud computing services, advanced AI driven applications and the Internet of Things (IoT) have led to increased reliance on internet by both individual users and enterprises—both small and large. On the contrary, the advancements in cybersecurity have not matched this pace consequently attracting exponentially rising trends of cyberattacks in the past decade. To enhance network security, this work proposes ResDNViT, a robust model integrating a self-attention-based Vision Transformer (ViT) architecture with a simplified ResNet-based architecture for NetFlow-based attack detection. Motivated by the strong performance of transformers in tasks related to NLP and computer vision, ResDNViT extends the ViT-based architecture for network traffic analysis by expressing NetFlow features as 2D matrices, and splitting them into equal-sized sub-matrices, that are used as input patches for the encoder component. A simplified residual dense network (ResDN) with two residual dense blocks (RDB) is stacked to the encoder’s output layer for classification. The novelty of this approach lies in effectively adapting the ViT-based architecture, originally designed for images, to analyzing NetFlow packets for attack classification. The model was evaluated on four well-studied benchmark datasets: the CICIDS2017_improved, Bot-IoT, CICIoT2022, and N-BaIoT, demonstrating an impressive performance across various classification tasks. The proposed approach’s ability to detect traffic from unseen device kinds was assessed by grouping devices from N-BaIoT into five categories based on usage: Thermostats, Baby Monitors, Doorbells, Security Cameras and Webcams. The model was trained using samples from four categories at a time and tested on samples from the remaining category. A high performance across metrics including accuracy, precision, recall, and F1-score for all categories highlighted the model’s robustness in traffic discrimination.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"282 ","pages":"Article 127504"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425011261","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The fast evolution of technologies like wireless sensor networks, cloud computing services, advanced AI driven applications and the Internet of Things (IoT) have led to increased reliance on internet by both individual users and enterprises—both small and large. On the contrary, the advancements in cybersecurity have not matched this pace consequently attracting exponentially rising trends of cyberattacks in the past decade. To enhance network security, this work proposes ResDNViT, a robust model integrating a self-attention-based Vision Transformer (ViT) architecture with a simplified ResNet-based architecture for NetFlow-based attack detection. Motivated by the strong performance of transformers in tasks related to NLP and computer vision, ResDNViT extends the ViT-based architecture for network traffic analysis by expressing NetFlow features as 2D matrices, and splitting them into equal-sized sub-matrices, that are used as input patches for the encoder component. A simplified residual dense network (ResDN) with two residual dense blocks (RDB) is stacked to the encoder’s output layer for classification. The novelty of this approach lies in effectively adapting the ViT-based architecture, originally designed for images, to analyzing NetFlow packets for attack classification. The model was evaluated on four well-studied benchmark datasets: the CICIDS2017_improved, Bot-IoT, CICIoT2022, and N-BaIoT, demonstrating an impressive performance across various classification tasks. The proposed approach’s ability to detect traffic from unseen device kinds was assessed by grouping devices from N-BaIoT into five categories based on usage: Thermostats, Baby Monitors, Doorbells, Security Cameras and Webcams. The model was trained using samples from four categories at a time and tested on samples from the remaining category. A high performance across metrics including accuracy, precision, recall, and F1-score for all categories highlighted the model’s robustness in traffic discrimination.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.