Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning

L. Scherf, Cigdem Turan, Dorothea Koert
{"title":"Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning","authors":"L. Scherf, Cigdem Turan, Dorothea Koert","doi":"10.1109/Humanoids53995.2022.10000078","DOIUrl":null,"url":null,"abstract":"Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.","PeriodicalId":180816,"journal":{"name":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Humanoids53995.2022.10000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.
从交互式强化学习中不可靠的人类行为建议中学习
交互式强化学习(IRL)使用人工输入来提高学习速度,并使学习能够在更复杂的环境中进行。人类操作建议是人类用户首选的输入渠道之一。然而,许多现有的IRL方法并没有明确考虑不准确的人类行为建议的可能性。此外,大多数考虑不准确建议的方法都是独立于状态计算人类行为建议的信任。这在实际情况中可能会导致问题,即人工输入可能只在某些状态下不准确,而在其他状态下仍然有用。为此,我们提出了一种新的算法来处理IRL中状态依赖的不可靠的人类行为建议。在这里,我们结合了三个潜在的不可靠建议的指标信号,即建议的一致性,建议的回顾性最优性,以及暗示人类不确定性的行为线索。我们在模拟网格世界和28个受试者的机器人分类任务中评估了我们的方法。我们表明,我们的方法优于独立于状态的基线,并分析与不可靠建议相关的行为线索的发生。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信