Anna Guerra, Francesco Guidi, D. Dardari, P. Djurić
{"title":"无人机自主导航、测绘和目标检测的强化学习","authors":"Anna Guerra, Francesco Guidi, D. Dardari, P. Djurić","doi":"10.1109/PLANS46316.2020.9110163","DOIUrl":null,"url":null,"abstract":"In this paper, we study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment. The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and, at the same time, to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection. This problem is formulated as a Markov decision process (MDP) where the UAV is an agent that runs either a state estimator for target detection and for environment mapping, and a reinforcement learning (RL) algorithm to infer its own policy of navigation (i.e., the control law). Numerical results show the feasibility of the proposed idea, highlighting the UAV's capability of autonomously exploring areas with high probability of target detection while reconstructing the surrounding environment.","PeriodicalId":273568,"journal":{"name":"2020 IEEE/ION Position, Location and Navigation Symposium (PLANS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target Detection\",\"authors\":\"Anna Guerra, Francesco Guidi, D. Dardari, P. Djurić\",\"doi\":\"10.1109/PLANS46316.2020.9110163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment. The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and, at the same time, to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection. This problem is formulated as a Markov decision process (MDP) where the UAV is an agent that runs either a state estimator for target detection and for environment mapping, and a reinforcement learning (RL) algorithm to infer its own policy of navigation (i.e., the control law). Numerical results show the feasibility of the proposed idea, highlighting the UAV's capability of autonomously exploring areas with high probability of target detection while reconstructing the surrounding environment.\",\"PeriodicalId\":273568,\"journal\":{\"name\":\"2020 IEEE/ION Position, Location and Navigation Symposium (PLANS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ION Position, Location and Navigation Symposium (PLANS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PLANS46316.2020.9110163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ION Position, Location and Navigation Symposium (PLANS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PLANS46316.2020.9110163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target Detection
In this paper, we study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment. The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and, at the same time, to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection. This problem is formulated as a Markov decision process (MDP) where the UAV is an agent that runs either a state estimator for target detection and for environment mapping, and a reinforcement learning (RL) algorithm to infer its own policy of navigation (i.e., the control law). Numerical results show the feasibility of the proposed idea, highlighting the UAV's capability of autonomously exploring areas with high probability of target detection while reconstructing the surrounding environment.