利用水下航行器进行搜索和检查任务的q -学习评估

G. Frost, D. Lane
{"title":"利用水下航行器进行搜索和检查任务的q -学习评估","authors":"G. Frost, D. Lane","doi":"10.1109/OCEANS.2014.7003088","DOIUrl":null,"url":null,"abstract":"An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.","PeriodicalId":368693,"journal":{"name":"2014 Oceans - St. John's","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Evaluation of Q-learning for search and inspect missions using underwater vehicles\",\"authors\":\"G. Frost, D. Lane\",\"doi\":\"10.1109/OCEANS.2014.7003088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.\",\"PeriodicalId\":368693,\"journal\":{\"name\":\"2014 Oceans - St. John's\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Oceans - St. John's\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/OCEANS.2014.7003088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Oceans - St. John's","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCEANS.2014.7003088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

提出了离线强化学习在水下领域的应用。我们提出并评估了将q -学习算法集成到自主水下航行器(AUV)中,用于模拟中动作值函数的学习。提出了三个独立的实验。第一个比较了两种搜索策略:ε -最少访问和随机行为,相对于收敛时间。第二个实验展示了学习折扣因子(gamma)对ε -最小访问搜索策略收敛时间的影响。最后的实验是验证离线学习策略在真实AUV上的使用。这一学习阶段发生在连续仿真环境下的离线状态下,该环境被离散化为网格世界的学习问题。仿真结果表明,系统在遵循两个次优策略的同时收敛到全局最优解。在讨论了我们的结果之后,介绍了未来的工作,以使系统能够在现实世界的应用中使用。因此,所提出的结果为将来对必要的改进(如状态空间的函数逼近)进行比较分析奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of Q-learning for search and inspect missions using underwater vehicles
An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信