{"title":"Reinforcement Learning for a Human-Following Robot","authors":"Yang Wang, David Lee","doi":"10.1109/ROMAN.2006.314435","DOIUrl":null,"url":null,"abstract":"This paper discusses the use of a mobile robot following a person. It focuses on the less researched interaction with the human attitude through robot movements. The reward, which indicates the attitude of the human, is used to train the network so that the robot learns an appropriate position relative to the person. The algorithm presented in this study overcomes the difficulty that the feedback reward score given by the human has no gradient throughout large parts of the input space. This network works online and has the ability to adapt to unpredictable changes in the person's preference","PeriodicalId":254129,"journal":{"name":"ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROMAN.2006.314435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
This paper discusses the use of a mobile robot following a person. It focuses on the less researched interaction with the human attitude through robot movements. The reward, which indicates the attitude of the human, is used to train the network so that the robot learns an appropriate position relative to the person. The algorithm presented in this study overcomes the difficulty that the feedback reward score given by the human has no gradient throughout large parts of the input space. This network works online and has the ability to adapt to unpredictable changes in the person's preference