从交互式强化学习中不可靠的人类行为建议中学习

2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids) Pub Date : 2022-11-28 DOI:10.1109/Humanoids53995.2022.10000078

L. Scherf, Cigdem Turan, Dorothea Koert

{"title":"从交互式强化学习中不可靠的人类行为建议中学习","authors":"L. Scherf, Cigdem Turan, Dorothea Koert","doi":"10.1109/Humanoids53995.2022.10000078","DOIUrl":null,"url":null,"abstract":"Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.","PeriodicalId":180816,"journal":{"name":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning\",\"authors\":\"L. Scherf, Cigdem Turan, Dorothea Koert\",\"doi\":\"10.1109/Humanoids53995.2022.10000078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.\",\"PeriodicalId\":180816,\"journal\":{\"name\":\"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Humanoids53995.2022.10000078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Humanoids53995.2022.10000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

交互式强化学习(IRL)使用人工输入来提高学习速度，并使学习能够在更复杂的环境中进行。人类操作建议是人类用户首选的输入渠道之一。然而，许多现有的IRL方法并没有明确考虑不准确的人类行为建议的可能性。此外，大多数考虑不准确建议的方法都是独立于状态计算人类行为建议的信任。这在实际情况中可能会导致问题，即人工输入可能只在某些状态下不准确，而在其他状态下仍然有用。为此，我们提出了一种新的算法来处理IRL中状态依赖的不可靠的人类行为建议。在这里，我们结合了三个潜在的不可靠建议的指标信号，即建议的一致性，建议的回顾性最优性，以及暗示人类不确定性的行为线索。我们在模拟网格世界和28个受试者的机器人分类任务中评估了我们的方法。我们表明，我们的方法优于独立于状态的基线，并分析与不可靠建议相关的行为线索的发生。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning

Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)

自引率

0.00%

发文量