{"title":"Transformer-based knowledge distillation for explainable intrusion detection system","authors":"Nadiah AL-Nomasy , Abdulelah Alamri , Ahamed Aljuhani , Prabhat Kumar","doi":"10.1016/j.cose.2025.104417","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid expansion of IoT networks has increased the risk of cyber threats, making intrusion detection systems (IDS) critical for maintaining security. However, most of the existing IDS rely on computationally intensive deep learning architectures, rendering them unsuitable for IoT environments with limited resources. Additionally, existing IDS approaches, including those using Knowledge Distillation (KD), often fail to capture the complex temporal dependencies and contextual relationships inherent in IoT traffic, which limits their ability to detect complex multi-stage attacks. Furthermore, these models frequently lack transparency, hindering effective decision-making by security experts. To address these gaps, we propose DistillGuard, a novel IDS framework designed specifically for IoT networks. The proposed framework employs a Transformer-based teacher model, which utilizes a hybrid attention mechanism combining multi-head self-attention (MHSA) and cross-attention layers to effectively capture both temporal and contextual patterns in network traffic. The framework further incorporates a Selective Gradient-Based Knowledge Distillation (SG-KD) process to transfer critical knowledge from the teacher model to a lightweight student model, optimizing performance while reducing computational costs. In addition, ‘DistillGuard’ integrates gradient contribution heatmaps, layer-wise contribution, and gradient selection impact analysis to provide detailed explanability, enabling security experts to understand which layers contribute to the detection of attacks. Experimental results demonstrate that ‘DistillGuard’ achieves superior detection accuracy and efficiency compared to existing state-of-the-art IDS models.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"154 ","pages":"Article 104417"},"PeriodicalIF":4.8000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825001063","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid expansion of IoT networks has increased the risk of cyber threats, making intrusion detection systems (IDS) critical for maintaining security. However, most of the existing IDS rely on computationally intensive deep learning architectures, rendering them unsuitable for IoT environments with limited resources. Additionally, existing IDS approaches, including those using Knowledge Distillation (KD), often fail to capture the complex temporal dependencies and contextual relationships inherent in IoT traffic, which limits their ability to detect complex multi-stage attacks. Furthermore, these models frequently lack transparency, hindering effective decision-making by security experts. To address these gaps, we propose DistillGuard, a novel IDS framework designed specifically for IoT networks. The proposed framework employs a Transformer-based teacher model, which utilizes a hybrid attention mechanism combining multi-head self-attention (MHSA) and cross-attention layers to effectively capture both temporal and contextual patterns in network traffic. The framework further incorporates a Selective Gradient-Based Knowledge Distillation (SG-KD) process to transfer critical knowledge from the teacher model to a lightweight student model, optimizing performance while reducing computational costs. In addition, ‘DistillGuard’ integrates gradient contribution heatmaps, layer-wise contribution, and gradient selection impact analysis to provide detailed explanability, enabling security experts to understand which layers contribute to the detection of attacks. Experimental results demonstrate that ‘DistillGuard’ achieves superior detection accuracy and efficiency compared to existing state-of-the-art IDS models.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.