{"title":"基于多辅助任务的移动机器人视觉探索学习扩散策略","authors":"Qifei Tang , Zengmao Wang , Wei Gao","doi":"10.1016/j.robot.2025.105199","DOIUrl":null,"url":null,"abstract":"<div><div>The application of diffusion models into the field of robotics is gaining increasing attention due to its advantages in modeling complex data distributions. In the visual navigation task of mobile robots based on diffusion policy, existing frameworks use the current observation as the guidance condition and adopt a classifier free guidance mode for joint training. However, using diffusion models for end-to-end training may result in feature loss, as the learned features are not well understood, which leading to poor generalization in unknown environments and low navigation success rates. To address the issue of generalization, we proposed a new visual navigation framework called MATdiff from the perspective of visual representation. Our framework utilizes two auxiliary tasks to enhance the representation capability of the Conditioned Observation Network. It leverages depth estimation to extract the geometric features of the environment and employs free-space segmentation to identify safely drivable regions, which are defined as areas free from obstacles and suitable for safe navigation. After the fusion of those features, we use a conditional diffusion model to model the distribution under observation conditions and generate a fixed number of consecutive waypoints. This design of auxiliary tasks ensures that the conditional features pays attention to both geometric and semantic information simultaneously. We conduct experiments in both simulation environments and the real world. Compared with the state-of-the-art methods, our method not only has lighter model parameters but also achieves the highest navigation success rate and a longer average travel distance before collision.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"195 ","pages":"Article 105199"},"PeriodicalIF":5.2000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MATdiff: Learning diffusion policy with multi-auxiliary task for mobile robot visual exploration\",\"authors\":\"Qifei Tang , Zengmao Wang , Wei Gao\",\"doi\":\"10.1016/j.robot.2025.105199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The application of diffusion models into the field of robotics is gaining increasing attention due to its advantages in modeling complex data distributions. In the visual navigation task of mobile robots based on diffusion policy, existing frameworks use the current observation as the guidance condition and adopt a classifier free guidance mode for joint training. However, using diffusion models for end-to-end training may result in feature loss, as the learned features are not well understood, which leading to poor generalization in unknown environments and low navigation success rates. To address the issue of generalization, we proposed a new visual navigation framework called MATdiff from the perspective of visual representation. Our framework utilizes two auxiliary tasks to enhance the representation capability of the Conditioned Observation Network. It leverages depth estimation to extract the geometric features of the environment and employs free-space segmentation to identify safely drivable regions, which are defined as areas free from obstacles and suitable for safe navigation. After the fusion of those features, we use a conditional diffusion model to model the distribution under observation conditions and generate a fixed number of consecutive waypoints. This design of auxiliary tasks ensures that the conditional features pays attention to both geometric and semantic information simultaneously. We conduct experiments in both simulation environments and the real world. Compared with the state-of-the-art methods, our method not only has lighter model parameters but also achieves the highest navigation success rate and a longer average travel distance before collision.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"195 \",\"pages\":\"Article 105199\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889025002969\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002969","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
MATdiff: Learning diffusion policy with multi-auxiliary task for mobile robot visual exploration
The application of diffusion models into the field of robotics is gaining increasing attention due to its advantages in modeling complex data distributions. In the visual navigation task of mobile robots based on diffusion policy, existing frameworks use the current observation as the guidance condition and adopt a classifier free guidance mode for joint training. However, using diffusion models for end-to-end training may result in feature loss, as the learned features are not well understood, which leading to poor generalization in unknown environments and low navigation success rates. To address the issue of generalization, we proposed a new visual navigation framework called MATdiff from the perspective of visual representation. Our framework utilizes two auxiliary tasks to enhance the representation capability of the Conditioned Observation Network. It leverages depth estimation to extract the geometric features of the environment and employs free-space segmentation to identify safely drivable regions, which are defined as areas free from obstacles and suitable for safe navigation. After the fusion of those features, we use a conditional diffusion model to model the distribution under observation conditions and generate a fixed number of consecutive waypoints. This design of auxiliary tasks ensures that the conditional features pays attention to both geometric and semantic information simultaneously. We conduct experiments in both simulation environments and the real world. Compared with the state-of-the-art methods, our method not only has lighter model parameters but also achieves the highest navigation success rate and a longer average travel distance before collision.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.