使用基于熵的掩蔽的自监督学习的低延迟和可解释的工业物联网入侵检测

IF 4.9 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2025-10-10 DOI:10.1016/j.compeleceng.2025.110753

Fasih Ullah Khan, Adnan Noor Mian

{"title":"使用基于熵的掩蔽的自监督学习的低延迟和可解释的工业物联网入侵检测","authors":"Fasih Ullah Khan, Adnan Noor Mian","doi":"10.1016/j.compeleceng.2025.110753","DOIUrl":null,"url":null,"abstract":"<div><div>The Industrial Internet of Things (IIoT) has enhanced data connectivity across domains like smart city and industry. But this advancement has also created several security risks necessitating robust security measures. One critical challenge in developing effective intrusion detection systems (IDS) for IIoT is class imbalance in training datasets. In most cases, benign traffic predominates, leading to biased model training and underperformance in detecting rare attacks. To address these issues and effectively detect both normal and various attack categories, even with label scarcity and class imbalance, we propose a low-latency gradient boosting framework for efficient intrusion detection. Our approach uses Self-supervised learning (SSL) to improve efficiency and robustness. This hybrid approach employs a Masked Autoencoder (MAE) for robust representation extraction from unlabeled data, followed by classification using LightGBM. To enhance the learning capability of proposed framework, we fuse an entropy-based masking strategy within the MAE. This allows features with high uncertainty to be masked with high probability during training. This targeted feature selection enables the model to reconstruct the most informative features. As a result, the model’s robustness is improved and it can capture strong feature dependencies, even in the presence of imbalanced and label-scarce data. We validate our model’s effectiveness on three publicly available datasets i.e. BoT-IoT, ToN-IoT, and WUSTL-IIoT. Proposed framework improves inference time by a factor of 104 over State-of-The-Art (SOTA) methods. It also achieves a precision, recall and F1-score of 99%, 93% and 95% respectively which are comparable to existing SOTA methods.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"128 ","pages":"Article 110753"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Low-latency and interpretable intrusion detection for IIoT using self-supervised learning with entropy-based masking\",\"authors\":\"Fasih Ullah Khan, Adnan Noor Mian\",\"doi\":\"10.1016/j.compeleceng.2025.110753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Industrial Internet of Things (IIoT) has enhanced data connectivity across domains like smart city and industry. But this advancement has also created several security risks necessitating robust security measures. One critical challenge in developing effective intrusion detection systems (IDS) for IIoT is class imbalance in training datasets. In most cases, benign traffic predominates, leading to biased model training and underperformance in detecting rare attacks. To address these issues and effectively detect both normal and various attack categories, even with label scarcity and class imbalance, we propose a low-latency gradient boosting framework for efficient intrusion detection. Our approach uses Self-supervised learning (SSL) to improve efficiency and robustness. This hybrid approach employs a Masked Autoencoder (MAE) for robust representation extraction from unlabeled data, followed by classification using LightGBM. To enhance the learning capability of proposed framework, we fuse an entropy-based masking strategy within the MAE. This allows features with high uncertainty to be masked with high probability during training. This targeted feature selection enables the model to reconstruct the most informative features. As a result, the model’s robustness is improved and it can capture strong feature dependencies, even in the presence of imbalanced and label-scarce data. We validate our model’s effectiveness on three publicly available datasets i.e. BoT-IoT, ToN-IoT, and WUSTL-IIoT. Proposed framework improves inference time by a factor of 104 over State-of-The-Art (SOTA) methods. It also achieves a precision, recall and F1-score of 99%, 93% and 95% respectively which are comparable to existing SOTA methods.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"128 \",\"pages\":\"Article 110753\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625006962\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625006962","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

工业物联网（IIoT）增强了智能城市和工业等领域的数据连接。但这一进步也带来了一些安全风险，需要采取强有力的安全措施。开发有效的工业物联网入侵检测系统（IDS）的一个关键挑战是训练数据集的类不平衡。在大多数情况下，良性流量占主导地位，导致有偏见的模型训练和检测罕见攻击的性能不佳。为了解决这些问题，并有效地检测正常和各种攻击类别，即使在标签稀缺和类不平衡的情况下，我们提出了一个低延迟梯度增强框架，用于有效的入侵检测。我们的方法使用自监督学习（SSL）来提高效率和鲁棒性。这种混合方法采用掩码自动编码器（MAE）从未标记数据中提取鲁棒表示，然后使用LightGBM进行分类。为了提高框架的学习能力，我们在MAE中融合了基于熵的掩蔽策略。这使得具有高不确定性的特征在训练过程中被高概率掩盖。这种有针对性的特征选择使模型能够重建最具信息量的特征。因此，该模型的鲁棒性得到了提高，即使在存在不平衡和标签稀缺数据的情况下，它也可以捕获强特征依赖性。我们在三个公开可用的数据集上验证了模型的有效性，即BoT-IoT， ToN-IoT和WUSTL-IIoT。所提出的框架比最先进的（SOTA）方法的推理时间提高了104倍。该方法的准确率、召回率和f1分数分别达到99%、93%和95%，与现有的SOTA方法相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Low-latency and interpretable intrusion detection for IIoT using self-supervised learning with entropy-based masking

The Industrial Internet of Things (IIoT) has enhanced data connectivity across domains like smart city and industry. But this advancement has also created several security risks necessitating robust security measures. One critical challenge in developing effective intrusion detection systems (IDS) for IIoT is class imbalance in training datasets. In most cases, benign traffic predominates, leading to biased model training and underperformance in detecting rare attacks. To address these issues and effectively detect both normal and various attack categories, even with label scarcity and class imbalance, we propose a low-latency gradient boosting framework for efficient intrusion detection. Our approach uses Self-supervised learning (SSL) to improve efficiency and robustness. This hybrid approach employs a Masked Autoencoder (MAE) for robust representation extraction from unlabeled data, followed by classification using LightGBM. To enhance the learning capability of proposed framework, we fuse an entropy-based masking strategy within the MAE. This allows features with high uncertainty to be masked with high probability during training. This targeted feature selection enables the model to reconstruct the most informative features. As a result, the model’s robustness is improved and it can capture strong feature dependencies, even in the presence of imbalanced and label-scarce data. We validate our model’s effectiveness on three publicly available datasets i.e. BoT-IoT, ToN-IoT, and WUSTL-IIoT. Proposed framework improves inference time by a factor of 104 over State-of-The-Art (SOTA) methods. It also achieves a precision, recall and F1-score of 99%, 93% and 95% respectively which are comparable to existing SOTA methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.