Human2bot: learning zero-shot reward functions for robotic manipulation from human demonstrations

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots Pub Date : 2025-04-15 DOI:10.1007/s10514-025-10193-9

Yasir Salam, Yinbei Li, Jonas Herzog, Jiaqiang Yang

{"title":"Human2bot: learning zero-shot reward functions for robotic manipulation from human demonstrations","authors":"Yasir Salam, Yinbei Li, Jonas Herzog, Jiaqiang Yang","doi":"10.1007/s10514-025-10193-9","DOIUrl":null,"url":null,"abstract":"<div><p>Developing effective reward functions is crucial for robot learning, as they guide behavior and facilitate adaptation to human-like tasks. We present Human2Bot (H2B), advancing the learning of such a generalized multi-task reward function that can be used zero-shot to execute unknown tasks in unseen environments. H2B is a newly designed task similarity estimation model that is trained on a large dataset of human videos. The model determines whether two videos from different environments represent the same task. At test time, the model serves as a reward function, evaluating how closely a robot’s execution matches the human demonstration. While previous approaches necessitate robot-specific data to learn reward functions or policies, our method can learn without any robot datasets. To achieve generalization in robotic environments, we incorporate a domain augmentation process that generates synthetic videos with varied visual appearances resembling simulation environments, alongside a multi-scale inter-frame attention mechanism that aligns human and robot task understanding. Finally, H2B is integrated with Visual Model Predictive Control (VMPC) to perform manipulation tasks in simulation and on the xARM6 robot in real-world settings. Our approach outperforms previous methods in simulated and real-world environments trained solely on human data, eliminating the need for privileged robot datasets.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 2","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-025-10193-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Developing effective reward functions is crucial for robot learning, as they guide behavior and facilitate adaptation to human-like tasks. We present Human2Bot (H2B), advancing the learning of such a generalized multi-task reward function that can be used zero-shot to execute unknown tasks in unseen environments. H2B is a newly designed task similarity estimation model that is trained on a large dataset of human videos. The model determines whether two videos from different environments represent the same task. At test time, the model serves as a reward function, evaluating how closely a robot’s execution matches the human demonstration. While previous approaches necessitate robot-specific data to learn reward functions or policies, our method can learn without any robot datasets. To achieve generalization in robotic environments, we incorporate a domain augmentation process that generates synthetic videos with varied visual appearances resembling simulation environments, alongside a multi-scale inter-frame attention mechanism that aligns human and robot task understanding. Finally, H2B is integrated with Visual Model Predictive Control (VMPC) to perform manipulation tasks in simulation and on the xARM6 robot in real-world settings. Our approach outperforms previous methods in simulated and real-world environments trained solely on human data, eliminating the need for privileged robot datasets.

查看原文本刊更多论文

Human2bot：从人类演示中学习机器人操作的零射击奖励函数

开发有效的奖励功能对机器人学习至关重要，因为它们指导行为并促进适应类似人类的任务。我们提出了Human2Bot (H2B)，推进了这种广义多任务奖励函数的学习，该函数可以在不可见的环境中使用零射击来执行未知任务。H2B是一种新设计的任务相似度估计模型，它是在一个大型人类视频数据集上训练的。该模型确定来自不同环境的两个视频是否代表相同的任务。在测试时，该模型作为奖励函数，评估机器人的执行与人类演示的接近程度。虽然以前的方法需要特定于机器人的数据来学习奖励函数或策略，但我们的方法可以在没有任何机器人数据集的情况下学习。为了在机器人环境中实现泛化，我们结合了一个域增强过程，该过程生成具有类似模拟环境的各种视觉外观的合成视频，以及一个多尺度帧间注意机制，使人类和机器人的任务理解保持一致。最后，H2B与视觉模型预测控制（VMPC）集成，在模拟和现实世界设置的xARM6机器人上执行操作任务。我们的方法在模拟和现实环境中优于以前的方法，这些方法只训练人类数据，消除了对特权机器人数据集的需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Robots 工程技术-机器人学

CiteScore

7.90

自引率

5.70%

发文量

审稿时长

3 months

期刊介绍： Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.