ADP：自适应扩散策略在学习和实践中激发机器人思维

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Automation Science and Engineering Pub Date : 2025-09-22 DOI:10.1109/TASE.2025.3612396

Dechao Chen;Zhengwen Chen;Xiangyan Zheng;Weiling Xu;Chencong Ma;Chentao Mao

{"title":"ADP：自适应扩散策略在学习和实践中激发机器人思维","authors":"Dechao Chen;Zhengwen Chen;Xiangyan Zheng;Weiling Xu;Chencong Ma;Chentao Mao","doi":"10.1109/TASE.2025.3612396","DOIUrl":null,"url":null,"abstract":"Adaptive control policies for robots often require balancing generalization from large offline datasets with efficient adaptation to specific deployment conditions. In this paper, we propose Adaptive Diffusion Policy (ADP), a two-stage framework that integrates temporal-aware diffusion models with parameter-efficient LoRA adaptation. First, in the learning stage, ADP imitates and generates actions based on image and video signals from a meager amount of expert demonstrations, considering both spatial and temporal information. This component contrasts with most existing works, which focus solely on spatial information. Second, in the practice stage, ADP incorporates a low-rank adaptation module into the policy, subsequently training it using residual reinforcement learning with minimal environment interaction. Experiments conducted on Meta-World benchmark demonstrate the efficiency of each ADP component and the superiority of ADP over representative baseline methods. Note to Practitioners—This work introduces Adaptive Diffusion Policy (ADP), a two-stage visuomotor framework that first learns from just a few image-and-video demonstrations by modeling both spatial and temporal cues, then rapidly refines its behavior via a lightweight low-rank adapter and residual reinforcement learning. The ADP enables swift skill acquisition on new tasks with minimal expert data and limited environment trials, making it ideal for industrial or household robots where extensive data collection is impractical. To apply ADP, collect a small set of demonstration clips, train the diffusion-based policy offline, and deploy the adapter online for in situ fine-tuning. The proposed Meta-World results show the ADP’s consistent gains over standard imitation and residual RL baselines, which is very easy for practitioners in multiple real-world robot scenarios.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"21585-21594"},"PeriodicalIF":6.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ADP: Adaptive Diffusion Policy Energizes Robots Thinking in Both Learning and Practice\",\"authors\":\"Dechao Chen;Zhengwen Chen;Xiangyan Zheng;Weiling Xu;Chencong Ma;Chentao Mao\",\"doi\":\"10.1109/TASE.2025.3612396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adaptive control policies for robots often require balancing generalization from large offline datasets with efficient adaptation to specific deployment conditions. In this paper, we propose Adaptive Diffusion Policy (ADP), a two-stage framework that integrates temporal-aware diffusion models with parameter-efficient LoRA adaptation. First, in the learning stage, ADP imitates and generates actions based on image and video signals from a meager amount of expert demonstrations, considering both spatial and temporal information. This component contrasts with most existing works, which focus solely on spatial information. Second, in the practice stage, ADP incorporates a low-rank adaptation module into the policy, subsequently training it using residual reinforcement learning with minimal environment interaction. Experiments conducted on Meta-World benchmark demonstrate the efficiency of each ADP component and the superiority of ADP over representative baseline methods. Note to Practitioners—This work introduces Adaptive Diffusion Policy (ADP), a two-stage visuomotor framework that first learns from just a few image-and-video demonstrations by modeling both spatial and temporal cues, then rapidly refines its behavior via a lightweight low-rank adapter and residual reinforcement learning. The ADP enables swift skill acquisition on new tasks with minimal expert data and limited environment trials, making it ideal for industrial or household robots where extensive data collection is impractical. To apply ADP, collect a small set of demonstration clips, train the diffusion-based policy offline, and deploy the adapter online for in situ fine-tuning. The proposed Meta-World results show the ADP’s consistent gains over standard imitation and residual RL baselines, which is very easy for practitioners in multiple real-world robot scenarios.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"21585-21594\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11174996/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11174996/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

机器人的自适应控制策略通常需要平衡大型离线数据集的泛化与对特定部署条件的有效适应。在本文中，我们提出了自适应扩散策略（ADP），这是一个将时间感知扩散模型与参数有效的LoRA自适应相结合的两阶段框架。首先，在学习阶段，ADP根据少量专家演示的图像和视频信号进行模仿并生成动作，同时考虑空间和时间信息。这一组成部分与大多数现有作品形成对比，这些作品只关注空间信息。其次，在实践阶段，ADP将低秩适应模块纳入策略，随后使用最小环境交互的残差强化学习对其进行训练。在Meta-World基准上进行的实验证明了每个ADP组件的效率以及ADP相对于代表性基线方法的优越性。从业人员注意：本工作介绍了自适应扩散策略（ADP），这是一个两阶段的视觉运动框架，首先通过对空间和时间线索进行建模，从少量图像和视频演示中学习，然后通过轻量级低秩适配器和残余强化学习快速改进其行为。ADP能够以最少的专家数据和有限的环境试验快速获得新任务的技能，使其成为工业或家用机器人的理想选择，因为大量数据收集是不切实际的。要应用ADP，需要收集一小组演示片段，离线训练基于扩散的策略，并在线部署适配器以进行原位微调。提出的Meta-World结果表明，ADP与标准模仿和残余RL基线相比具有一致的增益，这对于多个真实世界机器人场景的从业者来说非常容易。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ADP: Adaptive Diffusion Policy Energizes Robots Thinking in Both Learning and Practice

Adaptive control policies for robots often require balancing generalization from large offline datasets with efficient adaptation to specific deployment conditions. In this paper, we propose Adaptive Diffusion Policy (ADP), a two-stage framework that integrates temporal-aware diffusion models with parameter-efficient LoRA adaptation. First, in the learning stage, ADP imitates and generates actions based on image and video signals from a meager amount of expert demonstrations, considering both spatial and temporal information. This component contrasts with most existing works, which focus solely on spatial information. Second, in the practice stage, ADP incorporates a low-rank adaptation module into the policy, subsequently training it using residual reinforcement learning with minimal environment interaction. Experiments conducted on Meta-World benchmark demonstrate the efficiency of each ADP component and the superiority of ADP over representative baseline methods. Note to Practitioners—This work introduces Adaptive Diffusion Policy (ADP), a two-stage visuomotor framework that first learns from just a few image-and-video demonstrations by modeling both spatial and temporal cues, then rapidly refines its behavior via a lightweight low-rank adapter and residual reinforcement learning. The ADP enables swift skill acquisition on new tasks with minimal expert data and limited environment trials, making it ideal for industrial or household robots where extensive data collection is impractical. To apply ADP, collect a small set of demonstration clips, train the diffusion-based policy offline, and deploy the adapter online for in situ fine-tuning. The proposed Meta-World results show the ADP’s consistent gains over standard imitation and residual RL baselines, which is very easy for practitioners in multiple real-world robot scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.