{"title":"从零开始进行少量操作学习的深度网络","authors":"Yinghan Chen , Xueyang Yao , Bryan Tripp","doi":"10.1016/j.robot.2025.105056","DOIUrl":null,"url":null,"abstract":"<div><div>Deep networks can learn to process raw sensor data and produce control output for diverse tasks. However, to leverage these models’ flexibility and expressive power, past studies have trained them on massive amounts of data. In contrast, in this work, we attempt to train deep networks from scratch with very small datasets of object pose and gripper trajectories in manipulation-task demonstrations. The same setting has previously been used in programming-by-demonstration work with specialized statistical models such as task-parameterized Gaussian mixture models (TP-GMMs). We show that deep networks can learn manipulation tasks with performance that meets or exceeds that of past statistical models, given the same small numbers of demonstrations (5-30 in our tests), without any pretraining. Data augmentation is important for good performance and training the deep networks to be equivariant to frame transformations. Transformers performed slightly better than parameter-matched long-short-term-memory (LSTM) networks, and transformers had better training and inference times. In addition to testing these methods with physical tasks, we used a family of synthetic tasks to show that larger transformer models exhibit positive transfer across dozens of tasks, performing better on each task as they are trained on others. These results suggest that deep networks are potential alternatives to TP-GMM and related methods, having the advantage of needing fewer examples per task as the number of tasks grows. The results also suggest that the large data requirements of end-to-end manipulation learning are mainly due to perceptual factors, which may help to improve the design of end-to-end systems in the future.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"193 ","pages":"Article 105056"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep networks for few-shot manipulation learning from scratch\",\"authors\":\"Yinghan Chen , Xueyang Yao , Bryan Tripp\",\"doi\":\"10.1016/j.robot.2025.105056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep networks can learn to process raw sensor data and produce control output for diverse tasks. However, to leverage these models’ flexibility and expressive power, past studies have trained them on massive amounts of data. In contrast, in this work, we attempt to train deep networks from scratch with very small datasets of object pose and gripper trajectories in manipulation-task demonstrations. The same setting has previously been used in programming-by-demonstration work with specialized statistical models such as task-parameterized Gaussian mixture models (TP-GMMs). We show that deep networks can learn manipulation tasks with performance that meets or exceeds that of past statistical models, given the same small numbers of demonstrations (5-30 in our tests), without any pretraining. Data augmentation is important for good performance and training the deep networks to be equivariant to frame transformations. Transformers performed slightly better than parameter-matched long-short-term-memory (LSTM) networks, and transformers had better training and inference times. In addition to testing these methods with physical tasks, we used a family of synthetic tasks to show that larger transformer models exhibit positive transfer across dozens of tasks, performing better on each task as they are trained on others. These results suggest that deep networks are potential alternatives to TP-GMM and related methods, having the advantage of needing fewer examples per task as the number of tasks grows. The results also suggest that the large data requirements of end-to-end manipulation learning are mainly due to perceptual factors, which may help to improve the design of end-to-end systems in the future.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"193 \",\"pages\":\"Article 105056\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889025001423\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025001423","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Deep networks for few-shot manipulation learning from scratch
Deep networks can learn to process raw sensor data and produce control output for diverse tasks. However, to leverage these models’ flexibility and expressive power, past studies have trained them on massive amounts of data. In contrast, in this work, we attempt to train deep networks from scratch with very small datasets of object pose and gripper trajectories in manipulation-task demonstrations. The same setting has previously been used in programming-by-demonstration work with specialized statistical models such as task-parameterized Gaussian mixture models (TP-GMMs). We show that deep networks can learn manipulation tasks with performance that meets or exceeds that of past statistical models, given the same small numbers of demonstrations (5-30 in our tests), without any pretraining. Data augmentation is important for good performance and training the deep networks to be equivariant to frame transformations. Transformers performed slightly better than parameter-matched long-short-term-memory (LSTM) networks, and transformers had better training and inference times. In addition to testing these methods with physical tasks, we used a family of synthetic tasks to show that larger transformer models exhibit positive transfer across dozens of tasks, performing better on each task as they are trained on others. These results suggest that deep networks are potential alternatives to TP-GMM and related methods, having the advantage of needing fewer examples per task as the number of tasks grows. The results also suggest that the large data requirements of end-to-end manipulation learning are mainly due to perceptual factors, which may help to improve the design of end-to-end systems in the future.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.