Covariance analysis as a measure of policy robustness

Nawid Jamali, Petar Kormushev, S. Ahmadzadeh, D. Caldwell
{"title":"Covariance analysis as a measure of policy robustness","authors":"Nawid Jamali, Petar Kormushev, S. Ahmadzadeh, D. Caldwell","doi":"10.1109/OCEANS-TAIPEI.2014.6964339","DOIUrl":null,"url":null,"abstract":"In this paper we propose covariance analysis as a metric for reinforcement learning to improve the robustness of a learned policy. The local optima found during the exploration are analyzed in terms of the total cumulative reward and the local behavior of the system in the neighborhood of the optima. The analysis is performed in the solution space to select a policy that exhibits robustness in uncertain and noisy environments. We demonstrate the utility of the method using our previously developed system where an autonomous underwater vehicle (AUV) has to recover from a thruster failure [1]. When a failure is detected the recovery system is invoked, which uses simulations to learn a new controller that utilizes the remaining functioning thrusters to achieve the goal of the AUV, that is, to reach a target position. In this paper, we use covariance analysis to examine the performance of the top, n, policies output by the previous algorithm. We propose a scoring metric that uses the output of the covariance analysis, the time it takes the AUV to reach the target position and the distance between the target position and the AUV's final position. The top polices are simulated in a noisy environment and evaluated using the proposed scoring metric to analyze the effect of noise on their performance. The policy that exhibits more tolerance to noise is selected. We show experimental results where covariance analysis successfully selects a more robust policy that was ranked lower by the original algorithm.","PeriodicalId":114739,"journal":{"name":"OCEANS 2014 - TAIPEI","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OCEANS 2014 - TAIPEI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCEANS-TAIPEI.2014.6964339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper we propose covariance analysis as a metric for reinforcement learning to improve the robustness of a learned policy. The local optima found during the exploration are analyzed in terms of the total cumulative reward and the local behavior of the system in the neighborhood of the optima. The analysis is performed in the solution space to select a policy that exhibits robustness in uncertain and noisy environments. We demonstrate the utility of the method using our previously developed system where an autonomous underwater vehicle (AUV) has to recover from a thruster failure [1]. When a failure is detected the recovery system is invoked, which uses simulations to learn a new controller that utilizes the remaining functioning thrusters to achieve the goal of the AUV, that is, to reach a target position. In this paper, we use covariance analysis to examine the performance of the top, n, policies output by the previous algorithm. We propose a scoring metric that uses the output of the covariance analysis, the time it takes the AUV to reach the target position and the distance between the target position and the AUV's final position. The top polices are simulated in a noisy environment and evaluated using the proposed scoring metric to analyze the effect of noise on their performance. The policy that exhibits more tolerance to noise is selected. We show experimental results where covariance analysis successfully selects a more robust policy that was ranked lower by the original algorithm.
协方差分析作为政策稳健性的度量
在本文中,我们提出协方差分析作为强化学习的度量来提高学习策略的鲁棒性。根据总累积奖励和系统在最优点附近的局部行为,分析了在探索过程中发现的局部最优点。在解空间中进行分析,以选择在不确定和噪声环境中表现出鲁棒性的策略。我们使用我们之前开发的系统演示了该方法的实用性,其中自主水下航行器(AUV)必须从推进器故障中恢复[1]。当检测到故障时,恢复系统被调用,它通过模拟来学习一个新的控制器,该控制器利用剩余的功能推进器来实现AUV的目标,即到达目标位置。在本文中,我们使用协方差分析来检验前一算法输出的前n个策略的性能。我们提出了一个评分指标,该指标使用协方差分析的输出、AUV到达目标位置所需的时间以及AUV到达目标位置与最终位置之间的距离。在有噪声的环境中模拟最高策略,并使用提出的评分指标进行评估,以分析噪声对其性能的影响。选择对噪声容忍度较高的策略。我们展示了实验结果,其中协方差分析成功地选择了一个更健壮的策略,该策略被原始算法排名较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信