FedKD-IDS：基于知识提炼的半监督联合学习和反中毒攻击机制的稳健入侵检测系统

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2024-11-26 DOI:10.1016/j.inffus.2024.102807

Nguyen Huu Quyen, Phan The Duy, Ngo Thao Nguyen, Nghi Hoang Khoa, Van-Hau Pham

{"title":"FedKD-IDS：基于知识提炼的半监督联合学习和反中毒攻击机制的稳健入侵检测系统","authors":"Nguyen Huu Quyen, Phan The Duy, Ngo Thao Nguyen, Nghi Hoang Khoa, Van-Hau Pham","doi":"10.1016/j.inffus.2024.102807","DOIUrl":null,"url":null,"abstract":"<div><div>In the realm of the Internet of Things (IoT), there has been a notable increase in the development and efficacy of Intrusion Detection Systems (IDS) that leverage machine learning (ML). Specifically, Federated Learning-based IDSs (FL-based IDS) have witnessed significant growth. These systems aim to mitigate data privacy breaches and minimize the communication overhead associated with dataset collection. Limited hardware resources also pose a significant constraint, preventing numerous IoT devices from actively engaging in FL. However, despite these advancements, certain challenges persist in the research domain. Issues such as elevated communication overhead, the potential for recovering private data, non-independent and identically distributed (Non-IID) data and a scarcity of labeled data remain noteworthy concerns. Additionally, vulnerabilities exist in the server-client communication during the FL process, creating opportunities for attackers to execute poisoning attacks on the client side with relative ease. To address these challenges, our paper introduces a semi-supervised approach for FL-based IDS. Our approach, named FedKD-IDS, employs knowledge distillation with a voting mechanism in place of weighted parameter aggregation and incorporates an anti-poisoning method. We conducted experiments to evaluate the effectiveness of our approach across diverse scenarios, including scenarios with Non-IID and varying data distributions. Additionally, we investigated various rates of malicious collaboration to demonstrate their impact in the federated training process. The results obtained from the real-world N-BaIoT dataset indicate that our approach surpasses the performance of the state-of-the-art (SOTA) SSFL method. Especially, even in the context of a poisoning attack where 50% of all collaborators targeted label flipping attack, FedKD-IDS demonstrated an accuracy of 79%, surpassing SSFL, which achieved only 19.86%. Furthermore, the outcomes also validated that the FedKD-IDS method has the capability to exclude over 85% of malicious collaborators during the aggregation phase of the federated training process.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"117 ","pages":"Article 102807"},"PeriodicalIF":14.7000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedKD-IDS: A robust intrusion detection system using knowledge distillation-based semi-supervised federated learning and anti-poisoning attack mechanism\",\"authors\":\"Nguyen Huu Quyen, Phan The Duy, Ngo Thao Nguyen, Nghi Hoang Khoa, Van-Hau Pham\",\"doi\":\"10.1016/j.inffus.2024.102807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the realm of the Internet of Things (IoT), there has been a notable increase in the development and efficacy of Intrusion Detection Systems (IDS) that leverage machine learning (ML). Specifically, Federated Learning-based IDSs (FL-based IDS) have witnessed significant growth. These systems aim to mitigate data privacy breaches and minimize the communication overhead associated with dataset collection. Limited hardware resources also pose a significant constraint, preventing numerous IoT devices from actively engaging in FL. However, despite these advancements, certain challenges persist in the research domain. Issues such as elevated communication overhead, the potential for recovering private data, non-independent and identically distributed (Non-IID) data and a scarcity of labeled data remain noteworthy concerns. Additionally, vulnerabilities exist in the server-client communication during the FL process, creating opportunities for attackers to execute poisoning attacks on the client side with relative ease. To address these challenges, our paper introduces a semi-supervised approach for FL-based IDS. Our approach, named FedKD-IDS, employs knowledge distillation with a voting mechanism in place of weighted parameter aggregation and incorporates an anti-poisoning method. We conducted experiments to evaluate the effectiveness of our approach across diverse scenarios, including scenarios with Non-IID and varying data distributions. Additionally, we investigated various rates of malicious collaboration to demonstrate their impact in the federated training process. The results obtained from the real-world N-BaIoT dataset indicate that our approach surpasses the performance of the state-of-the-art (SOTA) SSFL method. Especially, even in the context of a poisoning attack where 50% of all collaborators targeted label flipping attack, FedKD-IDS demonstrated an accuracy of 79%, surpassing SSFL, which achieved only 19.86%. Furthermore, the outcomes also validated that the FedKD-IDS method has the capability to exclude over 85% of malicious collaborators during the aggregation phase of the federated training process.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"117 \",\"pages\":\"Article 102807\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253524005852\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005852","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在物联网（IoT）领域，利用机器学习（ML）的入侵检测系统（IDS）的开发和功效显著提高。具体来说，基于联合学习的入侵检测系统（FL-based IDS）得到了显著发展。这些系统旨在减少数据隐私泄露，并最大限度地降低与数据集收集相关的通信开销。有限的硬件资源也是一个重要的制约因素，使众多物联网设备无法主动参与 FL。然而，尽管取得了这些进展，研究领域仍存在某些挑战。通信开销增大、恢复私人数据的可能性、非独立和同分布（Non-IID）数据以及标记数据稀缺等问题仍然值得关注。此外，在 FL 过程中，服务器与客户端的通信存在漏洞，这为攻击者在客户端轻松实施中毒攻击创造了机会。为了应对这些挑战，我们的论文为基于 FL 的 IDS 引入了一种半监督方法。我们的方法被命名为 FedKD-IDS，它采用知识提炼和投票机制来代替加权参数聚合，并结合了一种反中毒方法。我们进行了实验，以评估我们的方法在不同场景下的有效性，包括非 IID 场景和不同的数据分布。此外，我们还研究了各种恶意协作率，以证明它们在联合训练过程中的影响。从真实世界 N-BaIoT 数据集获得的结果表明，我们的方法超越了最先进（SOTA）SSFL 方法的性能。特别是，即使在中毒攻击的情况下，50% 的合作者针对标签翻转攻击，FedKD-IDS 的准确率也达到了 79%，超过了仅为 19.86% 的 SSFL。此外，研究结果还验证了 FedKD-IDS 方法有能力在联合训练过程的聚合阶段排除 85% 以上的恶意合作者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FedKD-IDS: A robust intrusion detection system using knowledge distillation-based semi-supervised federated learning and anti-poisoning attack mechanism

In the realm of the Internet of Things (IoT), there has been a notable increase in the development and efficacy of Intrusion Detection Systems (IDS) that leverage machine learning (ML). Specifically, Federated Learning-based IDSs (FL-based IDS) have witnessed significant growth. These systems aim to mitigate data privacy breaches and minimize the communication overhead associated with dataset collection. Limited hardware resources also pose a significant constraint, preventing numerous IoT devices from actively engaging in FL. However, despite these advancements, certain challenges persist in the research domain. Issues such as elevated communication overhead, the potential for recovering private data, non-independent and identically distributed (Non-IID) data and a scarcity of labeled data remain noteworthy concerns. Additionally, vulnerabilities exist in the server-client communication during the FL process, creating opportunities for attackers to execute poisoning attacks on the client side with relative ease. To address these challenges, our paper introduces a semi-supervised approach for FL-based IDS. Our approach, named FedKD-IDS, employs knowledge distillation with a voting mechanism in place of weighted parameter aggregation and incorporates an anti-poisoning method. We conducted experiments to evaluate the effectiveness of our approach across diverse scenarios, including scenarios with Non-IID and varying data distributions. Additionally, we investigated various rates of malicious collaboration to demonstrate their impact in the federated training process. The results obtained from the real-world N-BaIoT dataset indicate that our approach surpasses the performance of the state-of-the-art (SOTA) SSFL method. Especially, even in the context of a poisoning attack where 50% of all collaborators targeted label flipping attack, FedKD-IDS demonstrated an accuracy of 79%, surpassing SSFL, which achieved only 19.86%. Furthermore, the outcomes also validated that the FedKD-IDS method has the capability to exclude over 85% of malicious collaborators during the aggregation phase of the federated training process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.