基于强化学习的不透明执行传感器激活策略优化

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2024-10-07 DOI:10.1109/JSEN.2024.3471931

Jiahan He;Deguang Wang;Ming Yang;Chengbin Liang

{"title":"基于强化学习的不透明执行传感器激活策略优化","authors":"Jiahan He;Deguang Wang;Ming Yang;Chengbin Liang","doi":"10.1109/JSEN.2024.3471931","DOIUrl":null,"url":null,"abstract":"As a confidentiality property, opacity characterizes the ability of an external intruder to infer the secret information of a system. Ensuring opacity can be realized by dynamic sensor activation to manage event observability. By controlling which sensors are active and what events are observable, the system can effectively prevent the exposure of sensitive information, ensuring that the confidential parts of its behavior remain opaque. In practice, event hiding and sensor switching involved in dynamic sensor activation are recognized as costly operations. This study addresses the numerical optimization problem of sensor activation policy (SAP) to enforce opacity using reinforcement learning (RL). A most permissive observer (MPO) is used to incorporate all valid SAPs that ensure opacity. The quantitative objective of the optimization problem is to minimize the maximum discounted total cost. A systematic procedure is provided to convert MPO into a Markov game, facilitating the use of RL techniques. Minimax Q-learning is presented as the methodology to derive an optimal policy for sensor activation/deactivation decisions from the convergent Q-table. Finally, the effectiveness and applicability of the proposed method are demonstrated on a location-tracking problem in a smart factory setting.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"24 22","pages":"38429-38439"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sensor Activation Policy Optimization for Opacity Enforcement Based on Reinforcement Learning\",\"authors\":\"Jiahan He;Deguang Wang;Ming Yang;Chengbin Liang\",\"doi\":\"10.1109/JSEN.2024.3471931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a confidentiality property, opacity characterizes the ability of an external intruder to infer the secret information of a system. Ensuring opacity can be realized by dynamic sensor activation to manage event observability. By controlling which sensors are active and what events are observable, the system can effectively prevent the exposure of sensitive information, ensuring that the confidential parts of its behavior remain opaque. In practice, event hiding and sensor switching involved in dynamic sensor activation are recognized as costly operations. This study addresses the numerical optimization problem of sensor activation policy (SAP) to enforce opacity using reinforcement learning (RL). A most permissive observer (MPO) is used to incorporate all valid SAPs that ensure opacity. The quantitative objective of the optimization problem is to minimize the maximum discounted total cost. A systematic procedure is provided to convert MPO into a Markov game, facilitating the use of RL techniques. Minimax Q-learning is presented as the methodology to derive an optimal policy for sensor activation/deactivation decisions from the convergent Q-table. Finally, the effectiveness and applicability of the proposed method are demonstrated on a location-tracking problem in a smart factory setting.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"24 22\",\"pages\":\"38429-38439\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10706800/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10706800/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

作为一种保密属性，不透明性表征了外部入侵者推断系统秘密信息的能力。确保不透明可以通过动态激活传感器来实现，以管理事件的可观察性。通过控制哪些传感器处于激活状态以及哪些事件可被观察到，系统可以有效防止敏感信息的暴露，确保其行为的机密部分保持不透明。在实践中，动态传感器激活所涉及的事件隐藏和传感器切换被认为是代价高昂的操作。本研究利用强化学习（RL）解决了传感器激活策略（SAP）的数值优化问题，以强制执行不透明性。使用最允许观察者（MPO）来纳入所有有效的 SAP，以确保不透明度。优化问题的量化目标是最大限度地降低总成本。提供了将 MPO 转换为马尔可夫博弈的系统程序，便于使用 RL 技术。提出了最小 Q 学习方法，以从收敛 Q 表中得出传感器激活/停用决策的最优策略。最后，在智能工厂的位置跟踪问题上演示了所提方法的有效性和适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sensor Activation Policy Optimization for Opacity Enforcement Based on Reinforcement Learning

As a confidentiality property, opacity characterizes the ability of an external intruder to infer the secret information of a system. Ensuring opacity can be realized by dynamic sensor activation to manage event observability. By controlling which sensors are active and what events are observable, the system can effectively prevent the exposure of sensitive information, ensuring that the confidential parts of its behavior remain opaque. In practice, event hiding and sensor switching involved in dynamic sensor activation are recognized as costly operations. This study addresses the numerical optimization problem of sensor activation policy (SAP) to enforce opacity using reinforcement learning (RL). A most permissive observer (MPO) is used to incorporate all valid SAPs that ensure opacity. The quantitative objective of the optimization problem is to minimize the maximum discounted total cost. A systematic procedure is provided to convert MPO into a Markov game, facilitating the use of RL techniques. Minimax Q-learning is presented as the methodology to derive an optimal policy for sensor activation/deactivation decisions from the convergent Q-table. Finally, the effectiveness and applicability of the proposed method are demonstrated on a location-tracking problem in a smart factory setting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice