{"title":"SAIL:模拟信息主动野外学习","authors":"Elaine Schaertl Short, Adam Allevato, A. Thomaz","doi":"10.1109/HRI.2019.8673019","DOIUrl":null,"url":null,"abstract":"Robots in real-world environments may need to adapt context-specific behaviors learned in one environment to new environments with new constraints. In many cases, copresent humans can provide the robot with information, but it may not be safe for them to provide hands-on demonstrations and there may not be a dedicated supervisor to provide constant feedback. In this work we present the SAIL (Simulation-Informed Active In-the-Wild Learning) algorithm for learning new approaches to manipulation skills starting from a single demonstration. In this three-step algorithm, the robot simulates task execution to choose new potential approaches; collects unsupervised data on task execution in the target environment; and finally, chooses informative actions to show to co-present humans and obtain labels. Our approach enables a robot to learn new ways of executing two different tasks by using success/failure labels obtained from naïve users in a public space, performing 496 manipulation actions and collecting 163 labels from users in the wild over six 45-minute to 1-hour deployments. We show that classifiers based low-level sensor data can be used to accurately distinguish between successful and unsuccessful motions in a multi-step task ($\\mathbf{p} < 0.005$), even when trained in the wild. We also show that using the sensor data to choose which actions to sample is more effective than choosing the least-sampled action.","PeriodicalId":6600,"journal":{"name":"2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)","volume":"8 1","pages":"468-477"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SAIL: Simulation-Informed Active In-the-Wild Learning\",\"authors\":\"Elaine Schaertl Short, Adam Allevato, A. Thomaz\",\"doi\":\"10.1109/HRI.2019.8673019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robots in real-world environments may need to adapt context-specific behaviors learned in one environment to new environments with new constraints. In many cases, copresent humans can provide the robot with information, but it may not be safe for them to provide hands-on demonstrations and there may not be a dedicated supervisor to provide constant feedback. In this work we present the SAIL (Simulation-Informed Active In-the-Wild Learning) algorithm for learning new approaches to manipulation skills starting from a single demonstration. In this three-step algorithm, the robot simulates task execution to choose new potential approaches; collects unsupervised data on task execution in the target environment; and finally, chooses informative actions to show to co-present humans and obtain labels. Our approach enables a robot to learn new ways of executing two different tasks by using success/failure labels obtained from naïve users in a public space, performing 496 manipulation actions and collecting 163 labels from users in the wild over six 45-minute to 1-hour deployments. We show that classifiers based low-level sensor data can be used to accurately distinguish between successful and unsuccessful motions in a multi-step task ($\\\\mathbf{p} < 0.005$), even when trained in the wild. We also show that using the sensor data to choose which actions to sample is more effective than choosing the least-sampled action.\",\"PeriodicalId\":6600,\"journal\":{\"name\":\"2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)\",\"volume\":\"8 1\",\"pages\":\"468-477\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HRI.2019.8673019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HRI.2019.8673019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SAIL: Simulation-Informed Active In-the-Wild Learning
Robots in real-world environments may need to adapt context-specific behaviors learned in one environment to new environments with new constraints. In many cases, copresent humans can provide the robot with information, but it may not be safe for them to provide hands-on demonstrations and there may not be a dedicated supervisor to provide constant feedback. In this work we present the SAIL (Simulation-Informed Active In-the-Wild Learning) algorithm for learning new approaches to manipulation skills starting from a single demonstration. In this three-step algorithm, the robot simulates task execution to choose new potential approaches; collects unsupervised data on task execution in the target environment; and finally, chooses informative actions to show to co-present humans and obtain labels. Our approach enables a robot to learn new ways of executing two different tasks by using success/failure labels obtained from naïve users in a public space, performing 496 manipulation actions and collecting 163 labels from users in the wild over six 45-minute to 1-hour deployments. We show that classifiers based low-level sensor data can be used to accurately distinguish between successful and unsuccessful motions in a multi-step task ($\mathbf{p} < 0.005$), even when trained in the wild. We also show that using the sensor data to choose which actions to sample is more effective than choosing the least-sampled action.