人类跟随机器人的强化学习

ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication Pub Date : 2006-09-01 DOI:10.1109/ROMAN.2006.314435

Yang Wang, David Lee

{"title":"人类跟随机器人的强化学习","authors":"Yang Wang, David Lee","doi":"10.1109/ROMAN.2006.314435","DOIUrl":null,"url":null,"abstract":"This paper discusses the use of a mobile robot following a person. It focuses on the less researched interaction with the human attitude through robot movements. The reward, which indicates the attitude of the human, is used to train the network so that the robot learns an appropriate position relative to the person. The algorithm presented in this study overcomes the difficulty that the feedback reward score given by the human has no gradient throughout large parts of the input space. This network works online and has the ability to adapt to unpredictable changes in the person's preference","PeriodicalId":254129,"journal":{"name":"ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Reinforcement Learning for a Human-Following Robot\",\"authors\":\"Yang Wang, David Lee\",\"doi\":\"10.1109/ROMAN.2006.314435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses the use of a mobile robot following a person. It focuses on the less researched interaction with the human attitude through robot movements. The reward, which indicates the attitude of the human, is used to train the network so that the robot learns an appropriate position relative to the person. The algorithm presented in this study overcomes the difficulty that the feedback reward score given by the human has no gradient throughout large parts of the input space. This network works online and has the ability to adapt to unpredictable changes in the person's preference\",\"PeriodicalId\":254129,\"journal\":{\"name\":\"ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROMAN.2006.314435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROMAN.2006.314435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文讨论了移动机器人跟随人的用途。它侧重于较少研究的通过机器人运动与人类态度的相互作用。奖励表明人的态度，用于训练网络，使机器人学习相对于人的适当位置。本研究提出的算法克服了人类给出的反馈奖励分数在大部分输入空间中没有梯度的困难。这个网络是在线运行的，并且有能力适应人们偏好的不可预测的变化

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement Learning for a Human-Following Robot

This paper discusses the use of a mobile robot following a person. It focuses on the less researched interaction with the human attitude through robot movements. The reward, which indicates the attitude of the human, is used to train the network so that the robot learns an appropriate position relative to the person. The algorithm presented in this study overcomes the difficulty that the feedback reward score given by the human has no gradient throughout large parts of the input space. This network works online and has the ability to adapt to unpredictable changes in the person's preference

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication

自引率

0.00%

发文量