半监督学习在复杂手臂运动控制任务中的应用

2008 IEEE International Conference on Robotics and Biomimetics Pub Date : 2009-02-22 DOI:10.1109/ROBIO.2009.4913257

Daniel Burfoot, Y. Kuniyoshi

{"title":"半监督学习在复杂手臂运动控制任务中的应用","authors":"Daniel Burfoot, Y. Kuniyoshi","doi":"10.1109/ROBIO.2009.4913257","DOIUrl":null,"url":null,"abstract":"In real world learning problems it is often the case that while the amount of labeled training data is limited, the amount of raw, unlabeled data available is vast. It is thus beneficial to develop ways of exploiting the large amount of unlabeled data to maximize the utility of each labeled sample. We examine this “semi-supervised” learning problem in the context of a flexible arm with complex dynamics. The goal of the learning process is to predict a reward value R, which evaluates the system's performance on a given task, from an input motor command M. We assume that the number of trials for which the reward is given is strictly limited. This makes it difficult to learn the function M → R, because of the complex dynamics of the arm. We also assume that there are a large number of unsupervised trials which give information about the trajectory I that results from a particular motor command M. Our method is to first learn a mapping from the motor command M to the trajectory I from the unsupervised samples, and then learn a mapping from I to the reward value R from the supervised samples. We show that the indirect learning process M → I → R achieves superior performance to the direct process M → R, under a wide variety of conditions.","PeriodicalId":321332,"journal":{"name":"2008 IEEE International Conference on Robotics and Biomimetics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised learning in a complex arm motor control task\",\"authors\":\"Daniel Burfoot, Y. Kuniyoshi\",\"doi\":\"10.1109/ROBIO.2009.4913257\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In real world learning problems it is often the case that while the amount of labeled training data is limited, the amount of raw, unlabeled data available is vast. It is thus beneficial to develop ways of exploiting the large amount of unlabeled data to maximize the utility of each labeled sample. We examine this “semi-supervised” learning problem in the context of a flexible arm with complex dynamics. The goal of the learning process is to predict a reward value R, which evaluates the system's performance on a given task, from an input motor command M. We assume that the number of trials for which the reward is given is strictly limited. This makes it difficult to learn the function M → R, because of the complex dynamics of the arm. We also assume that there are a large number of unsupervised trials which give information about the trajectory I that results from a particular motor command M. Our method is to first learn a mapping from the motor command M to the trajectory I from the unsupervised samples, and then learn a mapping from I to the reward value R from the supervised samples. We show that the indirect learning process M → I → R achieves superior performance to the direct process M → R, under a wide variety of conditions.\",\"PeriodicalId\":321332,\"journal\":{\"name\":\"2008 IEEE International Conference on Robotics and Biomimetics\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Robotics and Biomimetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO.2009.4913257\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Robotics and Biomimetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO.2009.4913257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在现实世界的学习问题中，经常出现这样的情况:虽然标记的训练数据数量有限，但可用的原始、未标记数据数量巨大。因此，开发利用大量未标记数据的方法以最大化每个标记样本的效用是有益的。我们在具有复杂动力学的柔性臂的背景下研究这种“半监督”学习问题。学习过程的目标是预测奖励值R，它评估系统在给定任务上的表现，从输入的运动命令m中，我们假设给予奖励的试验次数是严格限制的。这使得学习函数M→R变得困难，因为手臂的动力学非常复杂。我们还假设存在大量的无监督试验，这些试验给出了由特定电机命令M产生的轨迹I的信息。我们的方法是首先从无监督样本中学习从电机命令M到轨迹I的映射，然后从有监督样本中学习从I到奖励值R的映射。研究表明，在多种条件下，间接学习过程M→I→R比直接学习过程M→R具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semi-supervised learning in a complex arm motor control task

In real world learning problems it is often the case that while the amount of labeled training data is limited, the amount of raw, unlabeled data available is vast. It is thus beneficial to develop ways of exploiting the large amount of unlabeled data to maximize the utility of each labeled sample. We examine this “semi-supervised” learning problem in the context of a flexible arm with complex dynamics. The goal of the learning process is to predict a reward value R, which evaluates the system's performance on a given task, from an input motor command M. We assume that the number of trials for which the reward is given is strictly limited. This makes it difficult to learn the function M → R, because of the complex dynamics of the arm. We also assume that there are a large number of unsupervised trials which give information about the trajectory I that results from a particular motor command M. Our method is to first learn a mapping from the motor command M to the trajectory I from the unsupervised samples, and then learn a mapping from I to the reward value R from the supervised samples. We show that the indirect learning process M → I → R achieves superior performance to the direct process M → R, under a wide variety of conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Robotics and Biomimetics

自引率

0.00%

发文量