{"title":"基于鲁棒对比学习的视觉导航框架","authors":"Zengmao Wang, Jianhua Hu, Qifei Tang, Wei Gao","doi":"10.1002/rob.22508","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.</p>\n </div>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 5","pages":"2028-2041"},"PeriodicalIF":4.2000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COAL: Robust Contrastive Learning-Based Visual Navigation Framework\",\"authors\":\"Zengmao Wang, Jianhua Hu, Qifei Tang, Wei Gao\",\"doi\":\"10.1002/rob.22508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.</p>\\n </div>\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"42 5\",\"pages\":\"2028-2041\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22508\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22508","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Real-world robots will face a wide variety of complex environments when performing navigation or exploration tasks, especially in situations where the robots have never been seen before. Usually, robots need to establish local or global maps and then use path planning algorithms to determine their routes. However, in some environments, such as a wild grassy path or pavement on either side of a road, it is difficult for robots to plan routes through navigation maps. To address this, we propose a robust framework for robot navigation using contrastive learning called Contrastive Observation–Action in Latent (COAL) space. To extract features from the action space and observation space, respectively, COAL uses two different encoders. At the training stage, COAL does not require any data annotation and a mask approach is employed to keep features with significant differences away from each other in latent space. Similar to multimodal contrastive learning, we maximize bidirectional mutual information to align the features of observations and action sequences in latent space, which can enhance the generalization of the model. At the deployment stage, robots only need the current image as observation to complete exploration tasks. The most suitable action sequence is selected from the sampled data for generating control signals. We evaluate the robustness of COAL in both simulation and real environments. Only 41 min of unlabeled training data is required to allow COAL to explore environments that have never been seen before, even at night. Compared with state-of-the-art methods, COAL has the strongest robustness and generalization ability. More importantly, the robustness of COAL is further improved by augmenting our training data using other open-source data sets, which indicates that our framework has great potential to extract deep features of observations and action sequences. Our code and trained models are available at https://github.com/wzm206/COAL.
期刊介绍:
The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments.
The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.