{"title":"自主无人机导航的跨模态视觉运动策略学习","authors":"Yuhang Zhang;Jiaping Xiao;Mir Feroskhan","doi":"10.1109/LRA.2025.3559824","DOIUrl":null,"url":null,"abstract":"Developing effective vision-based navigation algorithms adapting to various scenarios is a significant challenge for autonomous drone systems, with vast potential in diverse real-world applications. This paper proposes a novel visuomotor policy learning framework for monocular autonomous navigation, combining cross-modal contrastive learning with deep reinforcement learning (DRL) to train a visuomotor policy. Our approach first leverages contrastive learning to extract consistent, task-focused visual representations from high-dimensional RGB images as depth images, and then directly maps these representations to action commands with DRL. This framework enables RGB images to capture structural and spatial information similar to depth images, which remains largely invariant under changes in lighting and texture, thereby maintaining robustness across various environments. We evaluate our approach through simulated and physical experiments, showing that our visuomotor policy outperforms baseline methods in both effectiveness and resilience to unseen visual disturbances. Our findings suggest that the key to enhancing transferability in monocular RGB-based navigation lies in achieving consistent, well-aligned visual representations across scenarios, which is an aspect often lacking in traditional end-to-end approaches.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 6","pages":"5425-5432"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Cross-Modal Visuomotor Policies for Autonomous Drone Navigation\",\"authors\":\"Yuhang Zhang;Jiaping Xiao;Mir Feroskhan\",\"doi\":\"10.1109/LRA.2025.3559824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing effective vision-based navigation algorithms adapting to various scenarios is a significant challenge for autonomous drone systems, with vast potential in diverse real-world applications. This paper proposes a novel visuomotor policy learning framework for monocular autonomous navigation, combining cross-modal contrastive learning with deep reinforcement learning (DRL) to train a visuomotor policy. Our approach first leverages contrastive learning to extract consistent, task-focused visual representations from high-dimensional RGB images as depth images, and then directly maps these representations to action commands with DRL. This framework enables RGB images to capture structural and spatial information similar to depth images, which remains largely invariant under changes in lighting and texture, thereby maintaining robustness across various environments. We evaluate our approach through simulated and physical experiments, showing that our visuomotor policy outperforms baseline methods in both effectiveness and resilience to unseen visual disturbances. Our findings suggest that the key to enhancing transferability in monocular RGB-based navigation lies in achieving consistent, well-aligned visual representations across scenarios, which is an aspect often lacking in traditional end-to-end approaches.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 6\",\"pages\":\"5425-5432\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10960642/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10960642/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Learning Cross-Modal Visuomotor Policies for Autonomous Drone Navigation
Developing effective vision-based navigation algorithms adapting to various scenarios is a significant challenge for autonomous drone systems, with vast potential in diverse real-world applications. This paper proposes a novel visuomotor policy learning framework for monocular autonomous navigation, combining cross-modal contrastive learning with deep reinforcement learning (DRL) to train a visuomotor policy. Our approach first leverages contrastive learning to extract consistent, task-focused visual representations from high-dimensional RGB images as depth images, and then directly maps these representations to action commands with DRL. This framework enables RGB images to capture structural and spatial information similar to depth images, which remains largely invariant under changes in lighting and texture, thereby maintaining robustness across various environments. We evaluate our approach through simulated and physical experiments, showing that our visuomotor policy outperforms baseline methods in both effectiveness and resilience to unseen visual disturbances. Our findings suggest that the key to enhancing transferability in monocular RGB-based navigation lies in achieving consistent, well-aligned visual representations across scenarios, which is an aspect often lacking in traditional end-to-end approaches.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.