强化学习中的自主目标检测和停止：源词估计案例研究

arXiv - CS - Artificial Intelligence Pub Date : 2024-09-14 DOI:arxiv-2409.09541

Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu

{"title":"强化学习中的自主目标检测和停止：源词估计案例研究","authors":"Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu","doi":"arxiv-2409.09541","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning has revolutionized decision-making processes in\ndynamic environments, yet it often struggles with autonomously detecting and\nachieving goals without clear feedback signals. For example, in a Source Term\nEstimation problem, the lack of precise environmental information makes it\nchallenging to provide clear feedback signals and to define and evaluate how\nthe source's location is determined. To address this challenge, the Autonomous\nGoal Detection and Cessation (AGDC) module was developed, enhancing various RL\nalgorithms by incorporating a self-feedback mechanism for autonomous goal\ndetection and cessation upon task completion. Our method effectively identifies\nand ceases undefined goals by approximating the agent's belief, significantly\nenhancing the capabilities of RL algorithms in environments with limited\nfeedback. To validate effectiveness of our approach, we integrated AGDC with\ndeep Q-Network, proximal policy optimization, and deep deterministic policy\ngradient algorithms, and evaluated its performance on the Source Term\nEstimation problem. The experimental results showed that AGDC-enhanced RL\nalgorithms significantly outperformed traditional statistical methods such as\ninfotaxis, entrotaxis, and dual control for exploitation and exploration, as\nwell as a non-statistical random action selection method. These improvements\nwere evident in terms of success rate, mean traveled distance, and search time,\nhighlighting AGDC's effectiveness and efficiency in complex, real-world\nscenarios.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation\",\"authors\":\"Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu\",\"doi\":\"arxiv-2409.09541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning has revolutionized decision-making processes in\\ndynamic environments, yet it often struggles with autonomously detecting and\\nachieving goals without clear feedback signals. For example, in a Source Term\\nEstimation problem, the lack of precise environmental information makes it\\nchallenging to provide clear feedback signals and to define and evaluate how\\nthe source's location is determined. To address this challenge, the Autonomous\\nGoal Detection and Cessation (AGDC) module was developed, enhancing various RL\\nalgorithms by incorporating a self-feedback mechanism for autonomous goal\\ndetection and cessation upon task completion. Our method effectively identifies\\nand ceases undefined goals by approximating the agent's belief, significantly\\nenhancing the capabilities of RL algorithms in environments with limited\\nfeedback. To validate effectiveness of our approach, we integrated AGDC with\\ndeep Q-Network, proximal policy optimization, and deep deterministic policy\\ngradient algorithms, and evaluated its performance on the Source Term\\nEstimation problem. The experimental results showed that AGDC-enhanced RL\\nalgorithms significantly outperformed traditional statistical methods such as\\ninfotaxis, entrotaxis, and dual control for exploitation and exploration, as\\nwell as a non-statistical random action selection method. These improvements\\nwere evident in terms of success rate, mean traveled distance, and search time,\\nhighlighting AGDC's effectiveness and efficiency in complex, real-world\\nscenarios.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

强化学习（Reinforcement Learning）已经彻底改变了动态环境中的决策过程，但在没有明确反馈信号的情况下，强化学习往往难以自主检测和实现目标。例如，在 "源术语估计 "问题中，由于缺乏精确的环境信息，因此很难提供明确的反馈信号，也很难定义和评估如何确定源的位置。为了应对这一挑战，我们开发了自主目标检测和停止（AGDC）模块，通过纳入自主目标检测和任务完成后停止的自我反馈机制来增强各种 RL 算法。我们的方法通过近似代理的信念来有效识别和停止未定义的目标，从而大大增强了有限反馈环境中 RL 算法的能力。为了验证我们方法的有效性，我们将 AGDC 与深度 Q 网络、近似策略优化和深度确定性策略梯度算法进行了集成，并在源术语估计问题上对其性能进行了评估。实验结果表明，AGDC 增强 RL 算法的性能明显优于传统的统计方法，如用于开发和探索的 Infotaxis、entrotaxis 和 dual control，以及非统计随机行动选择方法。这些改进在成功率、平均移动距离和搜索时间方面都很明显，凸显了 AGDC 在复杂的真实世界场景中的有效性和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation

Reinforcement Learning has revolutionized decision-making processes in dynamic environments, yet it often struggles with autonomously detecting and achieving goals without clear feedback signals. For example, in a Source Term Estimation problem, the lack of precise environmental information makes it challenging to provide clear feedback signals and to define and evaluate how the source's location is determined. To address this challenge, the Autonomous Goal Detection and Cessation (AGDC) module was developed, enhancing various RL algorithms by incorporating a self-feedback mechanism for autonomous goal detection and cessation upon task completion. Our method effectively identifies and ceases undefined goals by approximating the agent's belief, significantly enhancing the capabilities of RL algorithms in environments with limited feedback. To validate effectiveness of our approach, we integrated AGDC with deep Q-Network, proximal policy optimization, and deep deterministic policy gradient algorithms, and evaluated its performance on the Source Term Estimation problem. The experimental results showed that AGDC-enhanced RL algorithms significantly outperformed traditional statistical methods such as infotaxis, entrotaxis, and dual control for exploitation and exploration, as well as a non-statistical random action selection method. These improvements were evident in terms of success rate, mean traveled distance, and search time, highlighting AGDC's effectiveness and efficiency in complex, real-world scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量