基于鲁棒对比学习的视觉导航框架

IF 4.2 2区 计算机科学 Q2 ROBOTICS
Zengmao Wang, Jianhua Hu, Qifei Tang, Wei Gao
{"title":"基于鲁棒对比学习的视觉导航框架","authors":"Zengmao Wang,&nbsp;Jianhua Hu,&nbsp;Qifei Tang,&nbsp;Wei Gao","doi":"10.1002/rob.22508","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.</p>\n </div>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 5","pages":"2028-2041"},"PeriodicalIF":4.2000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COAL: Robust Contrastive Learning-Based Visual Navigation Framework\",\"authors\":\"Zengmao Wang,&nbsp;Jianhua Hu,&nbsp;Qifei Tang,&nbsp;Wei Gao\",\"doi\":\"10.1002/rob.22508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.</p>\\n </div>\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"42 5\",\"pages\":\"2028-2041\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22508\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22508","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

摘要

现实世界的机器人在执行导航或探索任务时将面临各种各样的复杂环境,特别是在机器人从未见过的情况下。通常,机器人需要建立局部或全局地图,然后使用路径规划算法来确定它们的路线。然而,在一些环境中,比如一条长满野草的小路或道路两侧的人行道,机器人很难通过导航地图来规划路线。为了解决这个问题,我们提出了一个使用对比学习的机器人导航鲁棒框架,称为潜在空间中的对比观察-行动(COAL)。为了分别从动作空间和观察空间中提取特征,COAL使用了两种不同的编码器。在训练阶段,COAL不需要对数据进行任何标注,并采用掩码方法将潜在空间中差异显著的特征隔离开来。与多模态对比学习类似,我们最大化双向互信息,在潜在空间中对齐观察值和动作序列的特征,从而增强模型的泛化能力。在部署阶段,机器人只需要当前图像作为观测,即可完成探测任务。从采样数据中选择最合适的动作序列产生控制信号。我们在模拟和真实环境中评估了COAL的鲁棒性。只需要41分钟的未标记训练数据,就可以让COAL探索以前从未见过的环境,即使是在夜间。与现有方法相比,COAL具有最强的鲁棒性和泛化能力。更重要的是,通过使用其他开源数据集增强我们的训练数据,进一步提高了COAL的鲁棒性,这表明我们的框架在提取观察和动作序列的深度特征方面具有很大的潜力。我们的代码和经过训练的模型可在https://github.com/wzm206/COAL上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
COAL: Robust Contrastive Learning-Based Visual Navigation Framework

Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Field Robotics
Journal of Field Robotics 工程技术-机器人学
CiteScore
15.00
自引率
3.60%
发文量
80
审稿时长
6 months
期刊介绍: The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信