{"title":"半监督学习在复杂手臂运动控制任务中的应用","authors":"Daniel Burfoot, Y. Kuniyoshi","doi":"10.1109/ROBIO.2009.4913257","DOIUrl":null,"url":null,"abstract":"In real world learning problems it is often the case that while the amount of labeled training data is limited, the amount of raw, unlabeled data available is vast. It is thus beneficial to develop ways of exploiting the large amount of unlabeled data to maximize the utility of each labeled sample. We examine this “semi-supervised” learning problem in the context of a flexible arm with complex dynamics. The goal of the learning process is to predict a reward value R, which evaluates the system's performance on a given task, from an input motor command M. We assume that the number of trials for which the reward is given is strictly limited. This makes it difficult to learn the function M → R, because of the complex dynamics of the arm. We also assume that there are a large number of unsupervised trials which give information about the trajectory I that results from a particular motor command M. Our method is to first learn a mapping from the motor command M to the trajectory I from the unsupervised samples, and then learn a mapping from I to the reward value R from the supervised samples. We show that the indirect learning process M → I → R achieves superior performance to the direct process M → R, under a wide variety of conditions.","PeriodicalId":321332,"journal":{"name":"2008 IEEE International Conference on Robotics and Biomimetics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised learning in a complex arm motor control task\",\"authors\":\"Daniel Burfoot, Y. Kuniyoshi\",\"doi\":\"10.1109/ROBIO.2009.4913257\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In real world learning problems it is often the case that while the amount of labeled training data is limited, the amount of raw, unlabeled data available is vast. It is thus beneficial to develop ways of exploiting the large amount of unlabeled data to maximize the utility of each labeled sample. We examine this “semi-supervised” learning problem in the context of a flexible arm with complex dynamics. The goal of the learning process is to predict a reward value R, which evaluates the system's performance on a given task, from an input motor command M. We assume that the number of trials for which the reward is given is strictly limited. This makes it difficult to learn the function M → R, because of the complex dynamics of the arm. We also assume that there are a large number of unsupervised trials which give information about the trajectory I that results from a particular motor command M. Our method is to first learn a mapping from the motor command M to the trajectory I from the unsupervised samples, and then learn a mapping from I to the reward value R from the supervised samples. We show that the indirect learning process M → I → R achieves superior performance to the direct process M → R, under a wide variety of conditions.\",\"PeriodicalId\":321332,\"journal\":{\"name\":\"2008 IEEE International Conference on Robotics and Biomimetics\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Robotics and Biomimetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO.2009.4913257\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Robotics and Biomimetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO.2009.4913257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-supervised learning in a complex arm motor control task
In real world learning problems it is often the case that while the amount of labeled training data is limited, the amount of raw, unlabeled data available is vast. It is thus beneficial to develop ways of exploiting the large amount of unlabeled data to maximize the utility of each labeled sample. We examine this “semi-supervised” learning problem in the context of a flexible arm with complex dynamics. The goal of the learning process is to predict a reward value R, which evaluates the system's performance on a given task, from an input motor command M. We assume that the number of trials for which the reward is given is strictly limited. This makes it difficult to learn the function M → R, because of the complex dynamics of the arm. We also assume that there are a large number of unsupervised trials which give information about the trajectory I that results from a particular motor command M. Our method is to first learn a mapping from the motor command M to the trajectory I from the unsupervised samples, and then learn a mapping from I to the reward value R from the supervised samples. We show that the indirect learning process M → I → R achieves superior performance to the direct process M → R, under a wide variety of conditions.