利用具有平滑奖励函数的软代理批判算法学习失速恢复策略

Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu
{"title":"利用具有平滑奖励函数的软代理批判算法学习失速恢复策略","authors":"Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu","doi":"10.1109/ROBIO58561.2023.10354940","DOIUrl":null,"url":null,"abstract":"We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"64 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions\",\"authors\":\"Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu\",\"doi\":\"10.1109/ROBIO58561.2023.10354940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.\",\"PeriodicalId\":505134,\"journal\":{\"name\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"64 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO58561.2023.10354940\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO58561.2023.10354940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了一种有效的失速恢复学习方法,该方法基于具有平滑奖励函数的软行为批评算法。失速对飞机和无人驾驶飞行器(UAV)来说极其危险,因为高度下降可能导致致命事故。失速恢复策略可以执行适当的控制顺序,将飞机从这种致命的情况中拯救出来。使用强化学习方法学习失速恢复策略是可取的,因为这种策略可以自动学习。然而,失速恢复训练具有挑战性,因为飞机与其环境之间的相互作用非常复杂。在这项工作中,所提出的失速恢复学习方法比其他方法产生了更好的性能。我们成功地将平滑奖励函数应用到了学习过程中,因为奖励函数对于策略学习的收敛性至关重要。我们将奖励缩放应用于具有自动熵学习功能的软演员批评算法,从而取得了良好的性能。实验结果表明,利用学习到的策略可以成功恢复停滞。对比结果表明,我们的方法比以前的算法效果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions
We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信