能量约束网络中传感器调度的不宁强盗

2022 Eighth Indian Control Conference (ICC) Pub Date : 2022-12-14 DOI:10.1109/ICC56513.2022.10093670

R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant

{"title":"能量约束网络中传感器调度的不宁强盗","authors":"R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant","doi":"10.1109/ICC56513.2022.10093670","DOIUrl":null,"url":null,"abstract":"We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.","PeriodicalId":101654,"journal":{"name":"2022 Eighth Indian Control Conference (ICC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Restless Bandits for Sensor Scheduling in Energy Constrained Networks\",\"authors\":\"R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant\",\"doi\":\"10.1109/ICC56513.2022.10093670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.\",\"PeriodicalId\":101654,\"journal\":{\"name\":\"2022 Eighth Indian Control Conference (ICC)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Eighth Indian Control Conference (ICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICC56513.2022.10093670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eighth Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC56513.2022.10093670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

研究了能量约束网络中的传感器调度问题。该模型采用具有武器动态可用性的不动多武装强盗模型。手臂代表传感器，由于能量的限制，它的可用性是动态的。数据传输速率取决于信道质量。传感器调度问题是一个时序决策问题，需要考虑信道质量的演变和传感器节点能级的波动。当具有可用能量的传感器被调度时，它根据信道质量产生数据速率，这被称为即时奖励。采用二态马尔可夫模型对信道质量进行建模。更高的通道状态对应于更高的质量，因此更高的即时奖励。当传感器没有被调度时，它不会产生任何奖励。能量不可用的传感器不被调度。此外，决策者无法观察到传感器的信道质量，但数据传输后的信号是可以观察到的。它被称为部分可见的不安宁的强盗。决策者的目标是通过顺序调度传感器使无限视界贴现累积奖励最大化。研究了惠特尔的索引策略，描述了索引公式的计算算法。研究了在线上线策略，分析了计算复杂度。仿真示例比较了不同策略(索引策略、rollout策略和近视策略)的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Restless Bandits for Sensor Scheduling in Energy Constrained Networks

We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 Eighth Indian Control Conference (ICC)

自引率

0.00%

发文量