Simon Hangl, Emre Ugur, S. Szedmák, J. Piater, A. Ude
{"title":"通过度量强化学习的反应性、任务特定对象操作","authors":"Simon Hangl, Emre Ugur, S. Szedmák, J. Piater, A. Ude","doi":"10.1109/ICAR.2015.7251511","DOIUrl":null,"url":null,"abstract":"In the context of manipulation of dynamical systems, it is not trivial to design controllers that can cope with unpredictable changes in the system being manipulated. For example, in a pouring task, the target cup might start moving or the agent may decide to change the amount of the liquid during action execution. In order to cope with these situations, the robot should smoothly (and timely) change its execution policy based on the requirements of the new situation. In this paper, we propose a robust method that allows the robot to smoothly and successfully react to such changes. The robot first learns a set of execution trajectories that can solve a number of tasks in different situations. When encountered with a novel situation, the robot smoothly adapts its trajectory to a new one that is generated by weighted linear combination of the previously learned trajectories, where the weights are computed using a metric that depends on the task. This task-dependent metric is automatically learned in the state space of the robot, rather than the motor control space, and further optimized using using reinforcement learning (RL) framework. We discuss that our system can learn and model various manipulation tasks such as pouring or reaching; and can successfully react to a wide range of perturbations introduced during task executions. We evaluated our method against ground truth with a synthetic trajectory dataset, and verified it in grasping and pouring tasks with a real robot.","PeriodicalId":432004,"journal":{"name":"2015 International Conference on Advanced Robotics (ICAR)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Reactive, task-specific object manipulation by metric reinforcement learning\",\"authors\":\"Simon Hangl, Emre Ugur, S. Szedmák, J. Piater, A. Ude\",\"doi\":\"10.1109/ICAR.2015.7251511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of manipulation of dynamical systems, it is not trivial to design controllers that can cope with unpredictable changes in the system being manipulated. For example, in a pouring task, the target cup might start moving or the agent may decide to change the amount of the liquid during action execution. In order to cope with these situations, the robot should smoothly (and timely) change its execution policy based on the requirements of the new situation. In this paper, we propose a robust method that allows the robot to smoothly and successfully react to such changes. The robot first learns a set of execution trajectories that can solve a number of tasks in different situations. When encountered with a novel situation, the robot smoothly adapts its trajectory to a new one that is generated by weighted linear combination of the previously learned trajectories, where the weights are computed using a metric that depends on the task. This task-dependent metric is automatically learned in the state space of the robot, rather than the motor control space, and further optimized using using reinforcement learning (RL) framework. We discuss that our system can learn and model various manipulation tasks such as pouring or reaching; and can successfully react to a wide range of perturbations introduced during task executions. We evaluated our method against ground truth with a synthetic trajectory dataset, and verified it in grasping and pouring tasks with a real robot.\",\"PeriodicalId\":432004,\"journal\":{\"name\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAR.2015.7251511\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Advanced Robotics (ICAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAR.2015.7251511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reactive, task-specific object manipulation by metric reinforcement learning
In the context of manipulation of dynamical systems, it is not trivial to design controllers that can cope with unpredictable changes in the system being manipulated. For example, in a pouring task, the target cup might start moving or the agent may decide to change the amount of the liquid during action execution. In order to cope with these situations, the robot should smoothly (and timely) change its execution policy based on the requirements of the new situation. In this paper, we propose a robust method that allows the robot to smoothly and successfully react to such changes. The robot first learns a set of execution trajectories that can solve a number of tasks in different situations. When encountered with a novel situation, the robot smoothly adapts its trajectory to a new one that is generated by weighted linear combination of the previously learned trajectories, where the weights are computed using a metric that depends on the task. This task-dependent metric is automatically learned in the state space of the robot, rather than the motor control space, and further optimized using using reinforcement learning (RL) framework. We discuss that our system can learn and model various manipulation tasks such as pouring or reaching; and can successfully react to a wide range of perturbations introduced during task executions. We evaluated our method against ground truth with a synthetic trajectory dataset, and verified it in grasping and pouring tasks with a real robot.