{"title":"NavTr:利用可学习的变换器查询进行对象-目标导航","authors":"Qiuyu Mao;Jikai Wang;Meng Xu;Zonghai Chen","doi":"10.1109/LRA.2024.3497718","DOIUrl":null,"url":null,"abstract":"This letter introduces \n<bold>Nav</b>\nigation \n<bold>Tr</b>\nansformer (NavTr), a novel framework for object-goal navigation using Transformer queries to enhance the learning and representation of environment states. By integrating semantic information, object positions, and neighborhood information, NavTr creates a unified, comprehensive, and extensible state representation for the object-goal navigating task. In the framework, the Transformer queries implicitly learn inter-object relationships, which facilitates high-level understanding of the environment. Additionally, NavTr implements target-oriented supervisory signals, such as rotation rewards and spatial loss, which improve exploration efficiency in the reinforcement learning framework. NavTr outperforms popular graph-based and Attention-based methods by a large margin in terms of success rate (SR) and success weighted by path length (SPL). Extensive experiments on the AI2-THOR dataset demonstrate the effectiveness of our approach.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11738-11745"},"PeriodicalIF":4.6000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NavTr: Object-Goal Navigation With Learnable Transformer Queries\",\"authors\":\"Qiuyu Mao;Jikai Wang;Meng Xu;Zonghai Chen\",\"doi\":\"10.1109/LRA.2024.3497718\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This letter introduces \\n<bold>Nav</b>\\nigation \\n<bold>Tr</b>\\nansformer (NavTr), a novel framework for object-goal navigation using Transformer queries to enhance the learning and representation of environment states. By integrating semantic information, object positions, and neighborhood information, NavTr creates a unified, comprehensive, and extensible state representation for the object-goal navigating task. In the framework, the Transformer queries implicitly learn inter-object relationships, which facilitates high-level understanding of the environment. Additionally, NavTr implements target-oriented supervisory signals, such as rotation rewards and spatial loss, which improve exploration efficiency in the reinforcement learning framework. NavTr outperforms popular graph-based and Attention-based methods by a large margin in terms of success rate (SR) and success weighted by path length (SPL). Extensive experiments on the AI2-THOR dataset demonstrate the effectiveness of our approach.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"9 12\",\"pages\":\"11738-11745\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10757356/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10757356/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
NavTr: Object-Goal Navigation With Learnable Transformer Queries
This letter introduces
Nav
igation
Tr
ansformer (NavTr), a novel framework for object-goal navigation using Transformer queries to enhance the learning and representation of environment states. By integrating semantic information, object positions, and neighborhood information, NavTr creates a unified, comprehensive, and extensible state representation for the object-goal navigating task. In the framework, the Transformer queries implicitly learn inter-object relationships, which facilitates high-level understanding of the environment. Additionally, NavTr implements target-oriented supervisory signals, such as rotation rewards and spatial loss, which improve exploration efficiency in the reinforcement learning framework. NavTr outperforms popular graph-based and Attention-based methods by a large margin in terms of success rate (SR) and success weighted by path length (SPL). Extensive experiments on the AI2-THOR dataset demonstrate the effectiveness of our approach.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.