Curiosity-Driven Reinforced Learning of Undesired Actions in Autonomous Intelligent Agents

Christopher Rosser, Khalid H. Abed
{"title":"Curiosity-Driven Reinforced Learning of Undesired Actions in Autonomous Intelligent Agents","authors":"Christopher Rosser, Khalid H. Abed","doi":"10.1109/SAMI50585.2021.9378666","DOIUrl":null,"url":null,"abstract":"Autonomous exploring agents are encouraged to explore unknown states in an environment when equipped with an intrinsic motivating factor such as curiosity. Although intrinsic motivation is a useful mechanism for an autonomous exploring agent in an environment that provides sparse rewards, it doubles as a mechanism for causing the agents to act in undesirable ways. In this paper, we show that highly-curious agents, attached with neural networks trained with the Machine Learning Agent Toolkit's (ML-Agents) implementation of the Proximal Policy Optimization (PPO) algorithm, and Intrinsic Curiosity Module (ICM), learn undesirable or reckless behaviors relatively early in the training process. We also show that strong correlations in the PPO training statistics of misbehaving agents may indicate when an actual human should intervene for safety during the RL training process.","PeriodicalId":402414,"journal":{"name":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI50585.2021.9378666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Autonomous exploring agents are encouraged to explore unknown states in an environment when equipped with an intrinsic motivating factor such as curiosity. Although intrinsic motivation is a useful mechanism for an autonomous exploring agent in an environment that provides sparse rewards, it doubles as a mechanism for causing the agents to act in undesirable ways. In this paper, we show that highly-curious agents, attached with neural networks trained with the Machine Learning Agent Toolkit's (ML-Agents) implementation of the Proximal Policy Optimization (PPO) algorithm, and Intrinsic Curiosity Module (ICM), learn undesirable or reckless behaviors relatively early in the training process. We also show that strong correlations in the PPO training statistics of misbehaving agents may indicate when an actual human should intervene for safety during the RL training process.
自主智能体中非期望行为的好奇心驱动强化学习
当具有好奇心等内在激励因素时,自主探索代理会被鼓励去探索环境中的未知状态。尽管内在动机对于在奖励稀少的环境中自主探索的智能体来说是一种有用的机制,但它同时也是一种导致智能体以不希望的方式行动的机制。在本文中,我们展示了高度好奇的智能体,与使用机器学习代理工具包(ML-Agents)实现的近端策略优化(PPO)算法和内在好奇心模块(ICM)训练的神经网络相连,在训练过程中相对较早地学习到不希望的或鲁莽的行为。我们还表明,行为不端的PPO训练统计数据中的强相关性可能表明,在RL训练过程中,实际的人类应该在何时进行干预以确保安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信