{"title":"AGFNet: RGB-T语义分割自适应门控融合网络","authors":"Xiaofei Zhou;Xiaoling Wu;Liuxin Bao;Haibing Yin;Qiuping Jiang;Jiyong Zhang","doi":"10.1109/TITS.2025.3528064","DOIUrl":null,"url":null,"abstract":"RGB-T semantic segmentation can effectively pop-out objects from challenging scenarios (e.g., low illumination and low contrast environments) by combining RGB and thermal infrared images. However, the existing cutting-edge RGB-T semantic segmentation methods often present insufficient exploration of multi-modal feature fusion, where they overlook the differences between the two modalities. In this paper, we propose an adaptive gated fusion network (AGFNet) to conduct RGB-T semantic segmentation, where the multi-modal features are combined via the gating mechanisms and the spatial details are enhanced via the introduction of edge information. Specifically, the AGFNet employs a cross-modal adaptive gated-attention fusion (CAGF) module to aggregate the RGB and thermal features, where we give a sufficient exploration of the complementarity between the two-modal features via the gated attention unit (GAU). Particularly, in GAU, the gates can be used to purify the features, and the channel and spatial attention mechanisms are further employed to enhance the two-modal features interactively. Then, we design an edge detection (ED) module to learn the object-related edge cues, which simultaneously incorporates local detail information from low-level features and global location information from high-level features. After that, we deploy the edge guidance (EG) module to emphasize the spatial details of the fused features. Next, we deploy the contextual elevation (CE) module to enrich the contextual information of features by iteratively introducing the sine and cosine functions. Finally, considering that the quality of thermal images is usually lower than that of RGB images, we progressively integrate the multi-level RGB encoder features with multi-level decoder features, thereby focusing more on appearance information. Following this way, we can acquire the final high-quality segmentation result. Extensive experiments are performed on three public datasets including MFNet, PST900 and FMB datasets, and the experimental results show that our method achieves competitive performance when compared with the 22 state-of-the-art methods.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 5","pages":"6477-6492"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AGFNet: Adaptive Gated Fusion Network for RGB-T Semantic Segmentation\",\"authors\":\"Xiaofei Zhou;Xiaoling Wu;Liuxin Bao;Haibing Yin;Qiuping Jiang;Jiyong Zhang\",\"doi\":\"10.1109/TITS.2025.3528064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-T semantic segmentation can effectively pop-out objects from challenging scenarios (e.g., low illumination and low contrast environments) by combining RGB and thermal infrared images. However, the existing cutting-edge RGB-T semantic segmentation methods often present insufficient exploration of multi-modal feature fusion, where they overlook the differences between the two modalities. In this paper, we propose an adaptive gated fusion network (AGFNet) to conduct RGB-T semantic segmentation, where the multi-modal features are combined via the gating mechanisms and the spatial details are enhanced via the introduction of edge information. Specifically, the AGFNet employs a cross-modal adaptive gated-attention fusion (CAGF) module to aggregate the RGB and thermal features, where we give a sufficient exploration of the complementarity between the two-modal features via the gated attention unit (GAU). Particularly, in GAU, the gates can be used to purify the features, and the channel and spatial attention mechanisms are further employed to enhance the two-modal features interactively. Then, we design an edge detection (ED) module to learn the object-related edge cues, which simultaneously incorporates local detail information from low-level features and global location information from high-level features. After that, we deploy the edge guidance (EG) module to emphasize the spatial details of the fused features. Next, we deploy the contextual elevation (CE) module to enrich the contextual information of features by iteratively introducing the sine and cosine functions. Finally, considering that the quality of thermal images is usually lower than that of RGB images, we progressively integrate the multi-level RGB encoder features with multi-level decoder features, thereby focusing more on appearance information. Following this way, we can acquire the final high-quality segmentation result. Extensive experiments are performed on three public datasets including MFNet, PST900 and FMB datasets, and the experimental results show that our method achieves competitive performance when compared with the 22 state-of-the-art methods.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 5\",\"pages\":\"6477-6492\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10858005/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10858005/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
AGFNet: Adaptive Gated Fusion Network for RGB-T Semantic Segmentation
RGB-T semantic segmentation can effectively pop-out objects from challenging scenarios (e.g., low illumination and low contrast environments) by combining RGB and thermal infrared images. However, the existing cutting-edge RGB-T semantic segmentation methods often present insufficient exploration of multi-modal feature fusion, where they overlook the differences between the two modalities. In this paper, we propose an adaptive gated fusion network (AGFNet) to conduct RGB-T semantic segmentation, where the multi-modal features are combined via the gating mechanisms and the spatial details are enhanced via the introduction of edge information. Specifically, the AGFNet employs a cross-modal adaptive gated-attention fusion (CAGF) module to aggregate the RGB and thermal features, where we give a sufficient exploration of the complementarity between the two-modal features via the gated attention unit (GAU). Particularly, in GAU, the gates can be used to purify the features, and the channel and spatial attention mechanisms are further employed to enhance the two-modal features interactively. Then, we design an edge detection (ED) module to learn the object-related edge cues, which simultaneously incorporates local detail information from low-level features and global location information from high-level features. After that, we deploy the edge guidance (EG) module to emphasize the spatial details of the fused features. Next, we deploy the contextual elevation (CE) module to enrich the contextual information of features by iteratively introducing the sine and cosine functions. Finally, considering that the quality of thermal images is usually lower than that of RGB images, we progressively integrate the multi-level RGB encoder features with multi-level decoder features, thereby focusing more on appearance information. Following this way, we can acquire the final high-quality segmentation result. Extensive experiments are performed on three public datasets including MFNet, PST900 and FMB datasets, and the experimental results show that our method achieves competitive performance when compared with the 22 state-of-the-art methods.
期刊介绍:
The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.