Embodied Reasoning for Discovering Object Properties via Manipulation

2021 IEEE International Conference on Robotics and Automation (ICRA) Pub Date : 2021-05-30 DOI:10.1109/ICRA48506.2021.9561212

J. Behrens, Michal Nazarczuk, K. Štěpánová, M. Hoffmann, Y. Demiris, K. Mikolajczyk

{"title":"Embodied Reasoning for Discovering Object Properties via Manipulation","authors":"J. Behrens, Michal Nazarczuk, K. Štěpánová, M. Hoffmann, Y. Demiris, K. Mikolajczyk","doi":"10.1109/ICRA48506.2021.9561212","DOIUrl":null,"url":null,"abstract":"In this paper, we present an integrated system that includes reasoning from visual and natural language inputs, action and motion planning, executing tasks by a robotic arm, manipulating objects, and discovering their properties. A vision to action module recognises the scene with objects and their attributes and analyses enquiries formulated in natural language. It performs multi-modal reasoning and generates a sequence of simple actions that can be executed by a robot. The scene model and action sequence are sent to a planning and execution module that generates a motion plan with collision avoidance, simulates the actions, and executes them. We use synthetic data to train various components of the system and test on a real robot to show the generalization capabilities. We focus on a tabletop scenario with objects that can be grasped by our embodied agent i.e. a 7DoF manipulator with a two-finger gripper. We evaluate the agent on 60 representative queries repeated 3 times (e.g., ’Check what is on the other side of the soda can’) concerning different objects and tasks in the scene. We perform experiments in a simulated and real environment and report the success rate for various components of the system. Our system achieves up to 80.6% success rate on challenging scenes and queries. We also analyse and discuss the challenges that such an intelligent embodied system faces.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this paper, we present an integrated system that includes reasoning from visual and natural language inputs, action and motion planning, executing tasks by a robotic arm, manipulating objects, and discovering their properties. A vision to action module recognises the scene with objects and their attributes and analyses enquiries formulated in natural language. It performs multi-modal reasoning and generates a sequence of simple actions that can be executed by a robot. The scene model and action sequence are sent to a planning and execution module that generates a motion plan with collision avoidance, simulates the actions, and executes them. We use synthetic data to train various components of the system and test on a real robot to show the generalization capabilities. We focus on a tabletop scenario with objects that can be grasped by our embodied agent i.e. a 7DoF manipulator with a two-finger gripper. We evaluate the agent on 60 representative queries repeated 3 times (e.g., ’Check what is on the other side of the soda can’) concerning different objects and tasks in the scene. We perform experiments in a simulated and real environment and report the success rate for various components of the system. Our system achieves up to 80.6% success rate on challenging scenes and queries. We also analyse and discuss the challenges that such an intelligent embodied system faces.

查看原文本刊更多论文

通过操作发现对象属性的具身推理

在本文中，我们提出了一个集成系统，包括从视觉和自然语言输入进行推理，动作和运动规划，通过机械臂执行任务，操纵物体以及发现它们的属性。一个从视觉到行动的模块可以识别带有物体及其属性的场景，并分析用自然语言表述的查询。它执行多模态推理，并生成一系列可以由机器人执行的简单动作。将场景模型和动作序列发送到规划执行模块，规划执行模块生成避碰运动计划，模拟动作并执行。我们使用合成数据来训练系统的各个组成部分，并在真实机器人上进行测试，以显示系统的泛化能力。我们关注的是一个桌面场景，其中的物体可以被我们的具体化代理(即带有两指抓取器的7DoF机械手)抓取。我们对场景中不同对象和任务的60个代表性查询(例如，“检查汽水罐的另一边是什么”)进行评估。我们在模拟和真实环境中进行了实验，并报告了系统各组件的成功率。我们的系统在具有挑战性的场景和查询上达到了80.6%的成功率。我们还分析和讨论了这种智能具体化系统所面临的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量