{"title":"Time Optimal Data Harvesting in Two Dimensions through Reinforcement Learning Without Engineered Reward Functions","authors":"Shili Wu, Yancheng Zhu, A. Datta, S. Andersson","doi":"10.23919/ACC55779.2023.10156033","DOIUrl":null,"url":null,"abstract":"We consider the problem of harvesting data from a set of targets distributed throughout a two dimensional environment. The targets broadcast their data to an agent flying above them, and the goal is for the agent to extract all the data and move to a desired final position in minimum time. While previous work developed optimal controllers for the one-dimensional version of the problem, such methods do not extend to the 2-D setting. Therefore, we first convert the problem into a Markov Decision Process in discrete time and then apply reinforcement learning to find high performing solutions using double deep Q learning. We use a simple binary cost function that directly captures the desired goal, and we overcome the challenge of the sparse nature of these rewards by incorporating hindsight experience replay. To improve learning efficiency, we also utilize prioritized sampling of the replay buffer. We demonstrate our approach through several simulations, which show a similar performance as an existing optimal controller in the 1-D setting, and explore the effect of both the replay buffer and the prioritized sampling in the 2-D setting.","PeriodicalId":397401,"journal":{"name":"2023 American Control Conference (ACC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 American Control Conference (ACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ACC55779.2023.10156033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We consider the problem of harvesting data from a set of targets distributed throughout a two dimensional environment. The targets broadcast their data to an agent flying above them, and the goal is for the agent to extract all the data and move to a desired final position in minimum time. While previous work developed optimal controllers for the one-dimensional version of the problem, such methods do not extend to the 2-D setting. Therefore, we first convert the problem into a Markov Decision Process in discrete time and then apply reinforcement learning to find high performing solutions using double deep Q learning. We use a simple binary cost function that directly captures the desired goal, and we overcome the challenge of the sparse nature of these rewards by incorporating hindsight experience replay. To improve learning efficiency, we also utilize prioritized sampling of the replay buffer. We demonstrate our approach through several simulations, which show a similar performance as an existing optimal controller in the 1-D setting, and explore the effect of both the replay buffer and the prioritized sampling in the 2-D setting.