能量约束网络中传感器调度的不宁强盗

R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant
{"title":"能量约束网络中传感器调度的不宁强盗","authors":"R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant","doi":"10.1109/ICC56513.2022.10093670","DOIUrl":null,"url":null,"abstract":"We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.","PeriodicalId":101654,"journal":{"name":"2022 Eighth Indian Control Conference (ICC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Restless Bandits for Sensor Scheduling in Energy Constrained Networks\",\"authors\":\"R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant\",\"doi\":\"10.1109/ICC56513.2022.10093670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.\",\"PeriodicalId\":101654,\"journal\":{\"name\":\"2022 Eighth Indian Control Conference (ICC)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Eighth Indian Control Conference (ICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICC56513.2022.10093670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eighth Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC56513.2022.10093670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

研究了能量约束网络中的传感器调度问题。该模型采用具有武器动态可用性的不动多武装强盗模型。手臂代表传感器,由于能量的限制,它的可用性是动态的。数据传输速率取决于信道质量。传感器调度问题是一个时序决策问题,需要考虑信道质量的演变和传感器节点能级的波动。当具有可用能量的传感器被调度时,它根据信道质量产生数据速率,这被称为即时奖励。采用二态马尔可夫模型对信道质量进行建模。更高的通道状态对应于更高的质量,因此更高的即时奖励。当传感器没有被调度时,它不会产生任何奖励。能量不可用的传感器不被调度。此外,决策者无法观察到传感器的信道质量,但数据传输后的信号是可以观察到的。它被称为部分可见的不安宁的强盗。决策者的目标是通过顺序调度传感器使无限视界贴现累积奖励最大化。研究了惠特尔的索引策略,描述了索引公式的计算算法。研究了在线上线策略,分析了计算复杂度。仿真示例比较了不同策略(索引策略、rollout策略和近视策略)的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Restless Bandits for Sensor Scheduling in Energy Constrained Networks
We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信