{"title":"多尺度特征融合引导的轻量级语义分割网络","authors":"Xin Ye, Junchen Pan, Jichen Chen, Jingbo Zhang","doi":"10.1002/rob.22406","DOIUrl":null,"url":null,"abstract":"<p>Semantic segmentation, a task of assigning class labels to each pixel in an image, has found applications in various real-world scenarios, including autonomous driving and scene understanding. However, its widespread use is hindered by the high computational burden. In this paper, we propose an efficient semantic segmentation method based on Feature Cascade Fusion Network (FCFNet) to address this challenge. FCFNet utilizes a dual-path framework comprising the Spatial Information Path (SIP) and the Context Information Path (CIP). SIP is a shallow structure that captures the local dependencies of each pixel to improve the accuracy of detailed segmentation. CIP is the main branch with a deeper structure that captures sufficient contextual information from input features. Moreover, we design an Efficient Receptive Field Module (ERFM) to enlarge the receptive field in the SIP. Meanwhile, Attention Shuffled Refinement Module is used to refine feature maps from different stages. Finally, we present an Attention-Guided Fusion Module to fuse the low- and high-level feature maps effectively. Experimental results show that our proposed FCFNet achieves 70.7% mean intersection over union (mIoU) on the Cityscapes data set and 68.1% mIoU on the CamVid data set, respectively, with inference speeds of 110 and 100 frames per second (FPS), respectively. Additionally, we evaluated FCFNet on the Nvidia Jetson Xavier embedded device, which demonstrated competitive performance while significantly reducing power consumption.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 1","pages":"272-286"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multiscale feature fusion-guided lightweight semantic segmentation network\",\"authors\":\"Xin Ye, Junchen Pan, Jichen Chen, Jingbo Zhang\",\"doi\":\"10.1002/rob.22406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Semantic segmentation, a task of assigning class labels to each pixel in an image, has found applications in various real-world scenarios, including autonomous driving and scene understanding. However, its widespread use is hindered by the high computational burden. In this paper, we propose an efficient semantic segmentation method based on Feature Cascade Fusion Network (FCFNet) to address this challenge. FCFNet utilizes a dual-path framework comprising the Spatial Information Path (SIP) and the Context Information Path (CIP). SIP is a shallow structure that captures the local dependencies of each pixel to improve the accuracy of detailed segmentation. CIP is the main branch with a deeper structure that captures sufficient contextual information from input features. Moreover, we design an Efficient Receptive Field Module (ERFM) to enlarge the receptive field in the SIP. Meanwhile, Attention Shuffled Refinement Module is used to refine feature maps from different stages. Finally, we present an Attention-Guided Fusion Module to fuse the low- and high-level feature maps effectively. Experimental results show that our proposed FCFNet achieves 70.7% mean intersection over union (mIoU) on the Cityscapes data set and 68.1% mIoU on the CamVid data set, respectively, with inference speeds of 110 and 100 frames per second (FPS), respectively. Additionally, we evaluated FCFNet on the Nvidia Jetson Xavier embedded device, which demonstrated competitive performance while significantly reducing power consumption.</p>\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"42 1\",\"pages\":\"272-286\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22406\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22406","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
A multiscale feature fusion-guided lightweight semantic segmentation network
Semantic segmentation, a task of assigning class labels to each pixel in an image, has found applications in various real-world scenarios, including autonomous driving and scene understanding. However, its widespread use is hindered by the high computational burden. In this paper, we propose an efficient semantic segmentation method based on Feature Cascade Fusion Network (FCFNet) to address this challenge. FCFNet utilizes a dual-path framework comprising the Spatial Information Path (SIP) and the Context Information Path (CIP). SIP is a shallow structure that captures the local dependencies of each pixel to improve the accuracy of detailed segmentation. CIP is the main branch with a deeper structure that captures sufficient contextual information from input features. Moreover, we design an Efficient Receptive Field Module (ERFM) to enlarge the receptive field in the SIP. Meanwhile, Attention Shuffled Refinement Module is used to refine feature maps from different stages. Finally, we present an Attention-Guided Fusion Module to fuse the low- and high-level feature maps effectively. Experimental results show that our proposed FCFNet achieves 70.7% mean intersection over union (mIoU) on the Cityscapes data set and 68.1% mIoU on the CamVid data set, respectively, with inference speeds of 110 and 100 frames per second (FPS), respectively. Additionally, we evaluated FCFNet on the Nvidia Jetson Xavier embedded device, which demonstrated competitive performance while significantly reducing power consumption.
期刊介绍:
The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments.
The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.