GauTOAO：基于高斯的面向任务的物体亲和力

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI:arxiv-2409.11941

Jiawen Wang, Dingsheng Luo

{"title":"GauTOAO：基于高斯的面向任务的物体亲和力","authors":"Jiawen Wang, Dingsheng Luo","doi":"arxiv-2409.11941","DOIUrl":null,"url":null,"abstract":"When your robot grasps an object using dexterous hands or grippers, it should\nunderstand the Task-Oriented Affordances of the Object(TOAO), as different\ntasks often require attention to specific parts of the object. To address this\nchallenge, we propose GauTOAO, a Gaussian-based framework for Task-Oriented\nAffordance of Objects, which leverages vision-language models in a zero-shot\nmanner to predict affordance-relevant regions of an object, given a natural\nlanguage query. Our approach introduces a new paradigm: \"static camera, moving\nobject,\" allowing the robot to better observe and understand the object in hand\nduring manipulation. GauTOAO addresses the limitations of existing methods,\nwhich often lack effective spatial grouping, by extracting a comprehensive 3D\nobject mask using DINO features. This mask is then used to conditionally query\ngaussians, producing a refined semantic distribution over the object for the\nspecified task. This approach results in more accurate TOAO extraction,\nenhancing the robot's understanding of the object and improving task\nperformance. We validate the effectiveness of GauTOAO through real-world\nexperiments, demonstrating its capability to generalize across various tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GauTOAO: Gaussian-based Task-Oriented Affordance of Objects\",\"authors\":\"Jiawen Wang, Dingsheng Luo\",\"doi\":\"arxiv-2409.11941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When your robot grasps an object using dexterous hands or grippers, it should\\nunderstand the Task-Oriented Affordances of the Object(TOAO), as different\\ntasks often require attention to specific parts of the object. To address this\\nchallenge, we propose GauTOAO, a Gaussian-based framework for Task-Oriented\\nAffordance of Objects, which leverages vision-language models in a zero-shot\\nmanner to predict affordance-relevant regions of an object, given a natural\\nlanguage query. Our approach introduces a new paradigm: \\\"static camera, moving\\nobject,\\\" allowing the robot to better observe and understand the object in hand\\nduring manipulation. GauTOAO addresses the limitations of existing methods,\\nwhich often lack effective spatial grouping, by extracting a comprehensive 3D\\nobject mask using DINO features. This mask is then used to conditionally query\\ngaussians, producing a refined semantic distribution over the object for the\\nspecified task. This approach results in more accurate TOAO extraction,\\nenhancing the robot's understanding of the object and improving task\\nperformance. We validate the effectiveness of GauTOAO through real-world\\nexperiments, demonstrating its capability to generalize across various tasks.\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当机器人使用灵巧的手或抓手抓取物体时，它应该了解物体的任务导向适配性（TOAO），因为不同的任务往往需要关注物体的特定部分。为了应对这一挑战，我们提出了基于高斯的物体任务相关性框架 GauTOAO，该框架在给定自然语言查询的情况下，利用视觉语言模型，以零帧方式预测物体的相关性区域。我们的方法引入了一种新的范式："静态相机，移动物体"，使机器人能够在操作过程中更好地观察和理解手中的物体。GauTOAO 利用 DINO 特征提取全面的 3D 物体掩码，解决了现有方法往往缺乏有效空间分组的局限性。然后利用该掩码对高斯进行有条件查询，为指定任务生成对象的精细语义分布。这种方法能更准确地提取 TOAO，增强机器人对物体的理解，提高任务性能。我们通过真实世界的实验验证了 GauTOAO 的有效性，证明了它在各种任务中的通用能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GauTOAO: Gaussian-based Task-Oriented Affordance of Objects

When your robot grasps an object using dexterous hands or grippers, it should understand the Task-Oriented Affordances of the Object(TOAO), as different tasks often require attention to specific parts of the object. To address this challenge, we propose GauTOAO, a Gaussian-based framework for Task-Oriented Affordance of Objects, which leverages vision-language models in a zero-shot manner to predict affordance-relevant regions of an object, given a natural language query. Our approach introduces a new paradigm: "static camera, moving object," allowing the robot to better observe and understand the object in hand during manipulation. GauTOAO addresses the limitations of existing methods, which often lack effective spatial grouping, by extracting a comprehensive 3D object mask using DINO features. This mask is then used to conditionally query gaussians, producing a refined semantic distribution over the object for the specified task. This approach results in more accurate TOAO extraction, enhancing the robot's understanding of the object and improving task performance. We validate the effectiveness of GauTOAO through real-world experiments, demonstrating its capability to generalize across various tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量