{"title":"能量约束网络中传感器调度的不宁强盗","authors":"R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant","doi":"10.1109/ICC56513.2022.10093670","DOIUrl":null,"url":null,"abstract":"We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.","PeriodicalId":101654,"journal":{"name":"2022 Eighth Indian Control Conference (ICC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Restless Bandits for Sensor Scheduling in Energy Constrained Networks\",\"authors\":\"R. Meshram, Kesav Kaza, Varun Mehta, S. Merchant\",\"doi\":\"10.1109/ICC56513.2022.10093670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.\",\"PeriodicalId\":101654,\"journal\":{\"name\":\"2022 Eighth Indian Control Conference (ICC)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Eighth Indian Control Conference (ICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICC56513.2022.10093670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eighth Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC56513.2022.10093670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Restless Bandits for Sensor Scheduling in Energy Constrained Networks
We consider the problem of sensor scheduling in energy constrained network. It is modeled using restless multi-armed bandits with dynamic availability of arms. An arm represents the sensor and due to the energy constrained its availability is dynamic. The data transmission rate depends on the channel quality. Sensor scheduling problem is a sequential decision problem which needs to account both for the evolution of the channel quality and fluctuation in energy levels of sensor nodes. When sensor with available energy is scheduled, it yields data rate based on channel quality, this is referred to as immediate reward. The channel quality is modeled using two state Markov model. The higher channel state corresponds to higher quality, and hence higher immediate reward. When sensors are not scheduled, it yields no reward. Sensors with non-availability of energy are not scheduled. Further, channel quality of sensors is not observable to the decision maker but signals after data transmissions are observable. It is called as partially observable restless bandits. The objective of decision maker is to maximize infinite horizon discounted cumulative reward by sequentially scheduling sensors. We study Whittle's index policy, and describe algorithm to compute index formula. We also study online rollout policy and analyze the computation complexity. The simulation examples compare the performances of different policies-index policy, rollout policy, and myopic policy.