Curiosity-Driven Reinforced Learning of Undesired Actions in Autonomous Intelligent Agents

2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI) Pub Date : 2021-01-21 DOI:10.1109/SAMI50585.2021.9378666

Christopher Rosser, Khalid H. Abed

引用次数: 2

Abstract

Autonomous exploring agents are encouraged to explore unknown states in an environment when equipped with an intrinsic motivating factor such as curiosity. Although intrinsic motivation is a useful mechanism for an autonomous exploring agent in an environment that provides sparse rewards, it doubles as a mechanism for causing the agents to act in undesirable ways. In this paper, we show that highly-curious agents, attached with neural networks trained with the Machine Learning Agent Toolkit's (ML-Agents) implementation of the Proximal Policy Optimization (PPO) algorithm, and Intrinsic Curiosity Module (ICM), learn undesirable or reckless behaviors relatively early in the training process. We also show that strong correlations in the PPO training statistics of misbehaving agents may indicate when an actual human should intervene for safety during the RL training process.

查看原文本刊更多论文

自主智能体中非期望行为的好奇心驱动强化学习

当具有好奇心等内在激励因素时，自主探索代理会被鼓励去探索环境中的未知状态。尽管内在动机对于在奖励稀少的环境中自主探索的智能体来说是一种有用的机制，但它同时也是一种导致智能体以不希望的方式行动的机制。在本文中，我们展示了高度好奇的智能体，与使用机器学习代理工具包(ML-Agents)实现的近端策略优化(PPO)算法和内在好奇心模块(ICM)训练的神经网络相连，在训练过程中相对较早地学习到不希望的或鲁莽的行为。我们还表明，行为不端的PPO训练统计数据中的强相关性可能表明，在RL训练过程中，实际的人类应该在何时进行干预以确保安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)

自引率

0.00%

发文量