L2D2: Robot Learning from 2D drawings

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots Pub Date : 2025-09-15 DOI:10.1007/s10514-025-10210-x

Shaunak A. Mehta, Heramb Nemlekar, Hari Sumant, Dylan P. Losey

{"title":"L2D2: Robot Learning from 2D drawings","authors":"Shaunak A. Mehta, Heramb Nemlekar, Hari Sumant, Dylan P. Losey","doi":"10.1007/s10514-025-10210-x","DOIUrl":null,"url":null,"abstract":"<div><p>Robots should learn new tasks from humans. But how do humans convey what they want the robot to do? Existing methods largely rely on humans physically guiding the robot arm throughout their intended task. Unfortunately — as we scale up the amount of data — physical guidance becomes prohibitively burdensome. Not only do humans need to operate robot hardware but also modify the environment (e.g., moving and resetting objects) to provide multiple task examples. In this work we propose L2D2, a sketching interface and imitation learning algorithm where humans can provide demonstrations by <i>drawing</i> the task. L2D2 starts with a single image of the robot arm and its workspace. Using a tablet, users draw and label trajectories on this image to illustrate how the robot should act. To collect new and diverse demonstrations, we no longer need the human to physically reset the workspace; instead, L2D2 leverages vision-language segmentation to autonomously vary object locations and generate synthetic images for the human to draw upon. We recognize that drawing trajectories is not as information-rich as physically demonstrating the task. Drawings are 2-dimensional and do not capture how the robot’s actions affect its environment. To address these fundamental challenges the next stage of L2D2 grounds the human’s static, 2D drawings in our dynamic, 3D world by leveraging a small set of physical demonstrations. Our experiments and user study suggest that L2D2 enables humans to provide more demonstrations with less time and effort than traditional approaches, and users prefer drawings over physical manipulation. When compared to other drawing-based approaches, we find that L2D2 learns more performant robot policies, requires a smaller dataset, and can generalize to longer-horizon tasks. See our project website: https://collab.me.vt.edu/L2D2/</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 3","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-025-10210-x.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-025-10210-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Robots should learn new tasks from humans. But how do humans convey what they want the robot to do? Existing methods largely rely on humans physically guiding the robot arm throughout their intended task. Unfortunately — as we scale up the amount of data — physical guidance becomes prohibitively burdensome. Not only do humans need to operate robot hardware but also modify the environment (e.g., moving and resetting objects) to provide multiple task examples. In this work we propose L2D2, a sketching interface and imitation learning algorithm where humans can provide demonstrations by drawing the task. L2D2 starts with a single image of the robot arm and its workspace. Using a tablet, users draw and label trajectories on this image to illustrate how the robot should act. To collect new and diverse demonstrations, we no longer need the human to physically reset the workspace; instead, L2D2 leverages vision-language segmentation to autonomously vary object locations and generate synthetic images for the human to draw upon. We recognize that drawing trajectories is not as information-rich as physically demonstrating the task. Drawings are 2-dimensional and do not capture how the robot’s actions affect its environment. To address these fundamental challenges the next stage of L2D2 grounds the human’s static, 2D drawings in our dynamic, 3D world by leveraging a small set of physical demonstrations. Our experiments and user study suggest that L2D2 enables humans to provide more demonstrations with less time and effort than traditional approaches, and users prefer drawings over physical manipulation. When compared to other drawing-based approaches, we find that L2D2 learns more performant robot policies, requires a smaller dataset, and can generalize to longer-horizon tasks. See our project website: https://collab.me.vt.edu/L2D2/

查看原文本刊更多论文

L2D2：机器人从2D图纸中学习

机器人应该向人类学习新任务。但是人类如何传达他们想让机器人做的事情呢？现有的方法在很大程度上依赖于人类在完成预定任务时对机器人手臂的物理指导。不幸的是，随着数据量的增加，物理指导变得非常繁重。人类不仅需要操作机器人硬件，还需要修改环境（例如移动和重置物体）来提供多个任务示例。在这项工作中，我们提出了L2D2，一个素描界面和模仿学习算法，人类可以通过绘制任务来提供演示。L2D2从机器人手臂及其工作空间的单一图像开始。使用平板电脑，用户在图像上绘制并标记轨迹，以说明机器人应该如何行动。为了收集新的和多样化的演示，我们不再需要人类物理地重置工作空间；相反，L2D2利用视觉语言分割来自主改变物体位置，并生成供人类使用的合成图像。我们认识到，绘制轨迹并不像实际演示任务那样信息丰富。图纸是二维的，不能捕捉机器人的动作如何影响其环境。为了解决这些根本性的挑战，L2D2的下一阶段通过利用一小部分物理演示，将人类静态的2D绘图置于动态的3D世界中。我们的实验和用户研究表明，与传统方法相比，L2D2使人类能够以更少的时间和精力提供更多的演示，并且用户更喜欢绘图而不是物理操作。与其他基于绘图的方法相比，我们发现L2D2学习了更高性能的机器人策略，需要更小的数据集，并且可以推广到更长期的任务。请参阅我们的项目网站：https://collab.me.vt.edu/L2D2/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Robots 工程技术-机器人学

CiteScore

7.90

自引率

5.70%

发文量

审稿时长

3 months

期刊介绍： Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.