Privacy-Aware DRL for Differential Games-Assisted Malware Defense in Edge Intelligence-Enabled Social IoT

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management Pub Date : 2026-02-18 DOI:10.1109/TNSM.2026.3666173

Shigen Shen;Jun Wu;Yizhou Shen;Xiaoping Wu;Jingnan Dong;Tian Wang;Ruidong Li

{"title":"Privacy-Aware DRL for Differential Games-Assisted Malware Defense in Edge Intelligence-Enabled Social IoT","authors":"Shigen Shen;Jun Wu;Yizhou Shen;Xiaoping Wu;Jingnan Dong;Tian Wang;Ruidong Li","doi":"10.1109/TNSM.2026.3666173","DOIUrl":null,"url":null,"abstract":"The edge intelligence-enabled Social Internet of Things (SIoT) faces severe security threats from stealthy malware propagation, while existing defenses struggle to model complex behaviors or provide real-time and privacy-aware responses. Herein, we propose a comprehensive malware defense framework integrating a five-state propagation model, continuous-time differential games, and a privacy-aware reinforcement learning algorithm named PP-D3QN (Privacy-Preserving Dueling Double Deep Q Network). The malware propagation model includes susceptible, infectious, patched, quarantined, and removed states, accurately representing centralized and cooperative patching as well as quarantine detection mechanisms. Leveraging differential games, optimal defense strategies are theoretically derived by solving the Hamilton–Jacobi–Bellman equation, dynamically balancing infection risk, patching benefits, and quarantine costs. The PP-D3QN algorithm employs prioritized experience replay with strict control over private data sampling and Gaussian noise perturbation to ensure differential privacy, while learning effective defense strategies through practical interaction with dynamic edge intelligence-enabled SIoT systems. Extensive simulations demonstrate that the proposed method significantly improves malware suppression speed and SIoT nodes recovery rates, showcasing strong theoretical and practical value. This work offers a rigorous and applicable solution for dynamic malware defense under privacy-preserving constraints in edge intelligence-enabled SIoT systems.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"23 ","pages":"2680-2693"},"PeriodicalIF":5.4000,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11398380/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The edge intelligence-enabled Social Internet of Things (SIoT) faces severe security threats from stealthy malware propagation, while existing defenses struggle to model complex behaviors or provide real-time and privacy-aware responses. Herein, we propose a comprehensive malware defense framework integrating a five-state propagation model, continuous-time differential games, and a privacy-aware reinforcement learning algorithm named PP-D3QN (Privacy-Preserving Dueling Double Deep Q Network). The malware propagation model includes susceptible, infectious, patched, quarantined, and removed states, accurately representing centralized and cooperative patching as well as quarantine detection mechanisms. Leveraging differential games, optimal defense strategies are theoretically derived by solving the Hamilton–Jacobi–Bellman equation, dynamically balancing infection risk, patching benefits, and quarantine costs. The PP-D3QN algorithm employs prioritized experience replay with strict control over private data sampling and Gaussian noise perturbation to ensure differential privacy, while learning effective defense strategies through practical interaction with dynamic edge intelligence-enabled SIoT systems. Extensive simulations demonstrate that the proposed method significantly improves malware suppression speed and SIoT nodes recovery rates, showcasing strong theoretical and practical value. This work offers a rigorous and applicable solution for dynamic malware defense under privacy-preserving constraints in edge intelligence-enabled SIoT systems.

查看原文本刊更多论文

边缘智能社交物联网中差分游戏辅助恶意软件防御的隐私感知DRL

边缘智能支持的社交物联网（SIoT）面临着来自隐蔽恶意软件传播的严重安全威胁，而现有的防御措施难以模拟复杂的行为或提供实时和隐私感知的响应。在此，我们提出了一个综合的恶意软件防御框架，该框架集成了五状态传播模型、连续时间微分对策和隐私感知强化学习算法PP-D3QN （Privacy-Preserving Dueling Double Deep Q Network）。恶意软件传播模型包括易感、感染、修补、隔离和已删除状态，准确地表示集中式和协作式修补以及隔离检测机制。利用微分对策，通过求解Hamilton-Jacobi-Bellman方程，动态平衡感染风险、补丁收益和隔离成本，从理论上推导出最优防御策略。PP-D3QN算法采用优先体验重放，严格控制私有数据采样和高斯噪声扰动，以确保差分隐私，同时通过与动态边缘智能SIoT系统的实际交互学习有效的防御策略。大量的仿真结果表明，该方法显著提高了恶意软件抑制速度和SIoT节点恢复速率，具有较强的理论和实用价值。这项工作为边缘智能SIoT系统中隐私保护约束下的动态恶意软件防御提供了一种严格且适用的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

15.10%

发文量

325

期刊介绍： IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.