从零开始进行少量操作学习的深度网络

IF 4.3 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-05-24 DOI:10.1016/j.robot.2025.105056

Yinghan Chen , Xueyang Yao , Bryan Tripp

{"title":"从零开始进行少量操作学习的深度网络","authors":"Yinghan Chen , Xueyang Yao , Bryan Tripp","doi":"10.1016/j.robot.2025.105056","DOIUrl":null,"url":null,"abstract":"<div><div>Deep networks can learn to process raw sensor data and produce control output for diverse tasks. However, to leverage these models’ flexibility and expressive power, past studies have trained them on massive amounts of data. In contrast, in this work, we attempt to train deep networks from scratch with very small datasets of object pose and gripper trajectories in manipulation-task demonstrations. The same setting has previously been used in programming-by-demonstration work with specialized statistical models such as task-parameterized Gaussian mixture models (TP-GMMs). We show that deep networks can learn manipulation tasks with performance that meets or exceeds that of past statistical models, given the same small numbers of demonstrations (5-30 in our tests), without any pretraining. Data augmentation is important for good performance and training the deep networks to be equivariant to frame transformations. Transformers performed slightly better than parameter-matched long-short-term-memory (LSTM) networks, and transformers had better training and inference times. In addition to testing these methods with physical tasks, we used a family of synthetic tasks to show that larger transformer models exhibit positive transfer across dozens of tasks, performing better on each task as they are trained on others. These results suggest that deep networks are potential alternatives to TP-GMM and related methods, having the advantage of needing fewer examples per task as the number of tasks grows. The results also suggest that the large data requirements of end-to-end manipulation learning are mainly due to perceptual factors, which may help to improve the design of end-to-end systems in the future.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"193 ","pages":"Article 105056"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep networks for few-shot manipulation learning from scratch\",\"authors\":\"Yinghan Chen , Xueyang Yao , Bryan Tripp\",\"doi\":\"10.1016/j.robot.2025.105056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep networks can learn to process raw sensor data and produce control output for diverse tasks. However, to leverage these models’ flexibility and expressive power, past studies have trained them on massive amounts of data. In contrast, in this work, we attempt to train deep networks from scratch with very small datasets of object pose and gripper trajectories in manipulation-task demonstrations. The same setting has previously been used in programming-by-demonstration work with specialized statistical models such as task-parameterized Gaussian mixture models (TP-GMMs). We show that deep networks can learn manipulation tasks with performance that meets or exceeds that of past statistical models, given the same small numbers of demonstrations (5-30 in our tests), without any pretraining. Data augmentation is important for good performance and training the deep networks to be equivariant to frame transformations. Transformers performed slightly better than parameter-matched long-short-term-memory (LSTM) networks, and transformers had better training and inference times. In addition to testing these methods with physical tasks, we used a family of synthetic tasks to show that larger transformer models exhibit positive transfer across dozens of tasks, performing better on each task as they are trained on others. These results suggest that deep networks are potential alternatives to TP-GMM and related methods, having the advantage of needing fewer examples per task as the number of tasks grows. The results also suggest that the large data requirements of end-to-end manipulation learning are mainly due to perceptual factors, which may help to improve the design of end-to-end systems in the future.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"193 \",\"pages\":\"Article 105056\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889025001423\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025001423","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

深度网络可以学习处理原始传感器数据，并为各种任务产生控制输出。然而，为了利用这些模型的灵活性和表现力，过去的研究已经对它们进行了大量数据的训练。相比之下，在这项工作中，我们尝试在操作任务演示中使用非常小的对象姿态和抓手轨迹数据集从头开始训练深度网络。同样的设置以前也用于专门统计模型的演示编程工作，如任务参数化高斯混合模型（tp - gmm）。我们表明，在没有任何预训练的情况下，给定相同的少量演示（在我们的测试中为5-30次），深度网络可以学习操作任务，其性能达到或超过过去的统计模型。数据增强对于良好的性能和训练深度网络对帧变换的等价是很重要的。变压器的表现略好于参数匹配的长短期记忆（LSTM）网络，并且变压器具有更好的训练和推理时间。除了用物理任务测试这些方法之外，我们还使用了一组合成任务来显示较大的变压器模型在几十个任务之间表现出积极的迁移，在每个任务上表现得更好，因为它们在其他任务上进行了训练。这些结果表明，深度网络是TP-GMM和相关方法的潜在替代品，随着任务数量的增加，每个任务需要更少的示例。研究结果还表明，端到端操作学习的大数据需求主要来自于感知因素，这可能有助于未来改进端到端系统的设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep networks for few-shot manipulation learning from scratch

Deep networks can learn to process raw sensor data and produce control output for diverse tasks. However, to leverage these models’ flexibility and expressive power, past studies have trained them on massive amounts of data. In contrast, in this work, we attempt to train deep networks from scratch with very small datasets of object pose and gripper trajectories in manipulation-task demonstrations. The same setting has previously been used in programming-by-demonstration work with specialized statistical models such as task-parameterized Gaussian mixture models (TP-GMMs). We show that deep networks can learn manipulation tasks with performance that meets or exceeds that of past statistical models, given the same small numbers of demonstrations (5-30 in our tests), without any pretraining. Data augmentation is important for good performance and training the deep networks to be equivariant to frame transformations. Transformers performed slightly better than parameter-matched long-short-term-memory (LSTM) networks, and transformers had better training and inference times. In addition to testing these methods with physical tasks, we used a family of synthetic tasks to show that larger transformer models exhibit positive transfer across dozens of tasks, performing better on each task as they are trained on others. These results suggest that deep networks are potential alternatives to TP-GMM and related methods, having the advantage of needing fewer examples per task as the number of tasks grows. The results also suggest that the large data requirements of end-to-end manipulation learning are mainly due to perceptual factors, which may help to improve the design of end-to-end systems in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.