Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation

Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu
{"title":"Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation","authors":"Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu","doi":"arxiv-2409.09541","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning has revolutionized decision-making processes in\ndynamic environments, yet it often struggles with autonomously detecting and\nachieving goals without clear feedback signals. For example, in a Source Term\nEstimation problem, the lack of precise environmental information makes it\nchallenging to provide clear feedback signals and to define and evaluate how\nthe source's location is determined. To address this challenge, the Autonomous\nGoal Detection and Cessation (AGDC) module was developed, enhancing various RL\nalgorithms by incorporating a self-feedback mechanism for autonomous goal\ndetection and cessation upon task completion. Our method effectively identifies\nand ceases undefined goals by approximating the agent's belief, significantly\nenhancing the capabilities of RL algorithms in environments with limited\nfeedback. To validate effectiveness of our approach, we integrated AGDC with\ndeep Q-Network, proximal policy optimization, and deep deterministic policy\ngradient algorithms, and evaluated its performance on the Source Term\nEstimation problem. The experimental results showed that AGDC-enhanced RL\nalgorithms significantly outperformed traditional statistical methods such as\ninfotaxis, entrotaxis, and dual control for exploitation and exploration, as\nwell as a non-statistical random action selection method. These improvements\nwere evident in terms of success rate, mean traveled distance, and search time,\nhighlighting AGDC's effectiveness and efficiency in complex, real-world\nscenarios.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement Learning has revolutionized decision-making processes in dynamic environments, yet it often struggles with autonomously detecting and achieving goals without clear feedback signals. For example, in a Source Term Estimation problem, the lack of precise environmental information makes it challenging to provide clear feedback signals and to define and evaluate how the source's location is determined. To address this challenge, the Autonomous Goal Detection and Cessation (AGDC) module was developed, enhancing various RL algorithms by incorporating a self-feedback mechanism for autonomous goal detection and cessation upon task completion. Our method effectively identifies and ceases undefined goals by approximating the agent's belief, significantly enhancing the capabilities of RL algorithms in environments with limited feedback. To validate effectiveness of our approach, we integrated AGDC with deep Q-Network, proximal policy optimization, and deep deterministic policy gradient algorithms, and evaluated its performance on the Source Term Estimation problem. The experimental results showed that AGDC-enhanced RL algorithms significantly outperformed traditional statistical methods such as infotaxis, entrotaxis, and dual control for exploitation and exploration, as well as a non-statistical random action selection method. These improvements were evident in terms of success rate, mean traveled distance, and search time, highlighting AGDC's effectiveness and efficiency in complex, real-world scenarios.
强化学习中的自主目标检测和停止:源词估计案例研究
强化学习(Reinforcement Learning)已经彻底改变了动态环境中的决策过程,但在没有明确反馈信号的情况下,强化学习往往难以自主检测和实现目标。例如,在 "源术语估计 "问题中,由于缺乏精确的环境信息,因此很难提供明确的反馈信号,也很难定义和评估如何确定源的位置。为了应对这一挑战,我们开发了自主目标检测和停止(AGDC)模块,通过纳入自主目标检测和任务完成后停止的自我反馈机制来增强各种 RL 算法。我们的方法通过近似代理的信念来有效识别和停止未定义的目标,从而大大增强了有限反馈环境中 RL 算法的能力。为了验证我们方法的有效性,我们将 AGDC 与深度 Q 网络、近似策略优化和深度确定性策略梯度算法进行了集成,并在源术语估计问题上对其性能进行了评估。实验结果表明,AGDC 增强 RL 算法的性能明显优于传统的统计方法,如用于开发和探索的 Infotaxis、entrotaxis 和 dual control,以及非统计随机行动选择方法。这些改进在成功率、平均移动距离和搜索时间方面都很明显,凸显了 AGDC 在复杂的真实世界场景中的有效性和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信