优势映射:利用优势函数提取场景，学习用户偏好操作的操作映射

Proceedings of the 10th International Conference on Human-Agent Interaction Pub Date : 2022-12-05 DOI:10.1145/3527188.3561917

Rinta Hasegawa, Yosuke Fukuchi, Kohei Okuoka, M. Imai

{"title":"优势映射:利用优势函数提取场景，学习用户偏好操作的操作映射","authors":"Rinta Hasegawa, Yosuke Fukuchi, Kohei Okuoka, M. Imai","doi":"10.1145/3527188.3561917","DOIUrl":null,"url":null,"abstract":"When a user manipulates a system, a user input through an interface, or an operation, is converted to the user’s intended action according to the mapping that links operations and actions, which we call “operation mapping”. Although many operation mappings are created by designers assuming how a typical user would operate the system, the optimal operation mapping may vary from user to user. The designer cannot prepare in advance all possible operation mappings. One approach to solve this problem involves autonomous learning of an operation mapping during the operation. However, existing methods require manual preparation of scenes for learning mappings. We propose advantage mapping, which enables the efficient learning of operation mappings. Working from the idea that scenes in which the user’s desired action is predictable are useful for learning operation mappings, advantage mapping extracts scenes according to the magnitude of entropy in the output of the action value function acquired from reinforcement learning. In our experiment, the user’s ideal operation mapping was more accurately obtained from the scenes selected by advantage mapping than from learning through actual play.","PeriodicalId":179256,"journal":{"name":"Proceedings of the 10th International Conference on Human-Agent Interaction","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advantage Mapping: Learning Operation Mapping for User-Preferred Manipulation by Extracting Scenes with Advantage Function\",\"authors\":\"Rinta Hasegawa, Yosuke Fukuchi, Kohei Okuoka, M. Imai\",\"doi\":\"10.1145/3527188.3561917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When a user manipulates a system, a user input through an interface, or an operation, is converted to the user’s intended action according to the mapping that links operations and actions, which we call “operation mapping”. Although many operation mappings are created by designers assuming how a typical user would operate the system, the optimal operation mapping may vary from user to user. The designer cannot prepare in advance all possible operation mappings. One approach to solve this problem involves autonomous learning of an operation mapping during the operation. However, existing methods require manual preparation of scenes for learning mappings. We propose advantage mapping, which enables the efficient learning of operation mappings. Working from the idea that scenes in which the user’s desired action is predictable are useful for learning operation mappings, advantage mapping extracts scenes according to the magnitude of entropy in the output of the action value function acquired from reinforcement learning. In our experiment, the user’s ideal operation mapping was more accurately obtained from the scenes selected by advantage mapping than from learning through actual play.\",\"PeriodicalId\":179256,\"journal\":{\"name\":\"Proceedings of the 10th International Conference on Human-Agent Interaction\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th International Conference on Human-Agent Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3527188.3561917\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th International Conference on Human-Agent Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3527188.3561917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当用户操作系统时，用户通过界面或操作的输入，根据连接操作和动作的映射，转换为用户预期的操作，我们称之为“操作映射”。尽管许多操作映射是由设计人员假设典型用户如何操作系统而创建的，但最佳操作映射可能因用户而异。设计人员无法预先准备所有可能的操作映射。解决这个问题的一种方法是在操作过程中对操作映射进行自主学习。然而，现有的方法需要手动准备场景来学习映射。我们提出了优势映射，使操作映射的有效学习成为可能。基于用户期望动作可预测的场景对学习操作映射有用的想法，优势映射根据从强化学习中获得的动作值函数输出中的熵的大小提取场景。在我们的实验中，用户的理想操作映射从优势映射选择的场景中获得比通过实际游戏学习更准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Advantage Mapping: Learning Operation Mapping for User-Preferred Manipulation by Extracting Scenes with Advantage Function

When a user manipulates a system, a user input through an interface, or an operation, is converted to the user’s intended action according to the mapping that links operations and actions, which we call “operation mapping”. Although many operation mappings are created by designers assuming how a typical user would operate the system, the optimal operation mapping may vary from user to user. The designer cannot prepare in advance all possible operation mappings. One approach to solve this problem involves autonomous learning of an operation mapping during the operation. However, existing methods require manual preparation of scenes for learning mappings. We propose advantage mapping, which enables the efficient learning of operation mappings. Working from the idea that scenes in which the user’s desired action is predictable are useful for learning operation mappings, advantage mapping extracts scenes according to the magnitude of entropy in the output of the action value function acquired from reinforcement learning. In our experiment, the user’s ideal operation mapping was more accurately obtained from the scenes selected by advantage mapping than from learning through actual play.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 10th International Conference on Human-Agent Interaction

自引率

0.00%

发文量