用反事实解释加速TAMER的学习

2022 IEEE International Conference on Development and Learning (ICDL) Pub Date : 2021-08-03 DOI:10.1109/ICDL53763.2022.9962222

Jakob Karalus, F. Lindner

{"title":"用反事实解释加速TAMER的学习","authors":"Jakob Karalus, F. Lindner","doi":"10.1109/ICDL53763.2022.9962222","DOIUrl":null,"url":null,"abstract":"The capability to interactively learn from human feedback would enable agents in new settings. For example, even novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) combines human feedback and Reinforcement Learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow learning speed, thus leading to a frustrating experience for the human. We approach this problem by extending the HRL framework TAMER for evaluative feedback with the possibility to enhance human feedback with two different types of counterfactual explanations (action and state based). We experimentally show that our extensions improve the speed of learning.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Accelerating the Learning of TAMER with Counterfactual Explanations\",\"authors\":\"Jakob Karalus, F. Lindner\",\"doi\":\"10.1109/ICDL53763.2022.9962222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The capability to interactively learn from human feedback would enable agents in new settings. For example, even novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) combines human feedback and Reinforcement Learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow learning speed, thus leading to a frustrating experience for the human. We approach this problem by extending the HRL framework TAMER for evaluative feedback with the possibility to enhance human feedback with two different types of counterfactual explanations (action and state based). We experimentally show that our extensions improve the speed of learning.\",\"PeriodicalId\":274171,\"journal\":{\"name\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDL53763.2022.9962222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

从人类反馈中交互式学习的能力将使代理能够适应新的环境。例如，即使是新手用户也可以自然地、交互式地训练服务机器人完成新任务。人在环强化学习(HRL)结合了人的反馈和强化学习(RL)技术。最先进的交互式学习技术存在学习速度慢的问题，从而导致人们的学习体验令人沮丧。我们通过扩展HRL框架TAMER来解决这个问题，以评估反馈，并有可能通过两种不同类型的反事实解释(基于行动和基于状态)来增强人类反馈。我们的实验表明，我们的扩展提高了学习的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating the Learning of TAMER with Counterfactual Explanations

The capability to interactively learn from human feedback would enable agents in new settings. For example, even novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) combines human feedback and Reinforcement Learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow learning speed, thus leading to a frustrating experience for the human. We approach this problem by extending the HRL framework TAMER for evaluative feedback with the possibility to enhance human feedback with two different types of counterfactual explanations (action and state based). We experimentally show that our extensions improve the speed of learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Development and Learning (ICDL)

自引率

0.00%

发文量