基于鲁棒对比学习的视觉导航框架

IF 4.2 2区计算机科学 Q2 ROBOTICS

Journal of Field Robotics Pub Date : 2025-01-07 DOI:10.1002/rob.22508

Zengmao Wang, Jianhua Hu, Qifei Tang, Wei Gao

{"title":"基于鲁棒对比学习的视觉导航框架","authors":"Zengmao Wang, Jianhua Hu, Qifei Tang, Wei Gao","doi":"10.1002/rob.22508","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.</p>\n </div>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 5","pages":"2028-2041"},"PeriodicalIF":4.2000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COAL: Robust Contrastive Learning-Based Visual Navigation Framework\",\"authors\":\"Zengmao Wang, Jianhua Hu, Qifei Tang, Wei Gao\",\"doi\":\"10.1002/rob.22508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.</p>\\n </div>\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"42 5\",\"pages\":\"2028-2041\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22508\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22508","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

现实世界的机器人在执行导航或探索任务时将面临各种各样的复杂环境，特别是在机器人从未见过的情况下。通常，机器人需要建立局部或全局地图，然后使用路径规划算法来确定它们的路线。然而，在一些环境中，比如一条长满野草的小路或道路两侧的人行道，机器人很难通过导航地图来规划路线。为了解决这个问题，我们提出了一个使用对比学习的机器人导航鲁棒框架，称为潜在空间中的对比观察-行动（COAL）。为了分别从动作空间和观察空间中提取特征，COAL使用了两种不同的编码器。在训练阶段，COAL不需要对数据进行任何标注，并采用掩码方法将潜在空间中差异显著的特征隔离开来。与多模态对比学习类似，我们最大化双向互信息，在潜在空间中对齐观察值和动作序列的特征，从而增强模型的泛化能力。在部署阶段，机器人只需要当前图像作为观测，即可完成探测任务。从采样数据中选择最合适的动作序列产生控制信号。我们在模拟和真实环境中评估了COAL的鲁棒性。只需要41分钟的未标记训练数据，就可以让COAL探索以前从未见过的环境，即使是在夜间。与现有方法相比，COAL具有最强的鲁棒性和泛化能力。更重要的是，通过使用其他开源数据集增强我们的训练数据，进一步提高了COAL的鲁棒性，这表明我们的框架在提取观察和动作序列的深度特征方面具有很大的潜力。我们的代码和经过训练的模型可在https://github.com/wzm206/COAL上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

COAL: Robust Contrastive Learning-Based Visual Navigation Framework

Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Field Robotics 工程技术-机器人学

CiteScore

15.00

自引率

3.60%

发文量

审稿时长

6 months

期刊介绍： The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.