Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu
{"title":"利用具有平滑奖励函数的软代理批判算法学习失速恢复策略","authors":"Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu","doi":"10.1109/ROBIO58561.2023.10354940","DOIUrl":null,"url":null,"abstract":"We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"64 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions\",\"authors\":\"Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu\",\"doi\":\"10.1109/ROBIO58561.2023.10354940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.\",\"PeriodicalId\":505134,\"journal\":{\"name\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"64 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO58561.2023.10354940\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO58561.2023.10354940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions
We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.