AGFNet: RGB-T语义分割自适应门控融合网络

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-01-29 DOI:10.1109/TITS.2025.3528064

Xiaofei Zhou;Xiaoling Wu;Liuxin Bao;Haibing Yin;Qiuping Jiang;Jiyong Zhang

{"title":"AGFNet: RGB-T语义分割自适应门控融合网络","authors":"Xiaofei Zhou;Xiaoling Wu;Liuxin Bao;Haibing Yin;Qiuping Jiang;Jiyong Zhang","doi":"10.1109/TITS.2025.3528064","DOIUrl":null,"url":null,"abstract":"RGB-T semantic segmentation can effectively pop-out objects from challenging scenarios (e.g., low illumination and low contrast environments) by combining RGB and thermal infrared images. However, the existing cutting-edge RGB-T semantic segmentation methods often present insufficient exploration of multi-modal feature fusion, where they overlook the differences between the two modalities. In this paper, we propose an adaptive gated fusion network (AGFNet) to conduct RGB-T semantic segmentation, where the multi-modal features are combined via the gating mechanisms and the spatial details are enhanced via the introduction of edge information. Specifically, the AGFNet employs a cross-modal adaptive gated-attention fusion (CAGF) module to aggregate the RGB and thermal features, where we give a sufficient exploration of the complementarity between the two-modal features via the gated attention unit (GAU). Particularly, in GAU, the gates can be used to purify the features, and the channel and spatial attention mechanisms are further employed to enhance the two-modal features interactively. Then, we design an edge detection (ED) module to learn the object-related edge cues, which simultaneously incorporates local detail information from low-level features and global location information from high-level features. After that, we deploy the edge guidance (EG) module to emphasize the spatial details of the fused features. Next, we deploy the contextual elevation (CE) module to enrich the contextual information of features by iteratively introducing the sine and cosine functions. Finally, considering that the quality of thermal images is usually lower than that of RGB images, we progressively integrate the multi-level RGB encoder features with multi-level decoder features, thereby focusing more on appearance information. Following this way, we can acquire the final high-quality segmentation result. Extensive experiments are performed on three public datasets including MFNet, PST900 and FMB datasets, and the experimental results show that our method achieves competitive performance when compared with the 22 state-of-the-art methods.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 5","pages":"6477-6492"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AGFNet: Adaptive Gated Fusion Network for RGB-T Semantic Segmentation\",\"authors\":\"Xiaofei Zhou;Xiaoling Wu;Liuxin Bao;Haibing Yin;Qiuping Jiang;Jiyong Zhang\",\"doi\":\"10.1109/TITS.2025.3528064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-T semantic segmentation can effectively pop-out objects from challenging scenarios (e.g., low illumination and low contrast environments) by combining RGB and thermal infrared images. However, the existing cutting-edge RGB-T semantic segmentation methods often present insufficient exploration of multi-modal feature fusion, where they overlook the differences between the two modalities. In this paper, we propose an adaptive gated fusion network (AGFNet) to conduct RGB-T semantic segmentation, where the multi-modal features are combined via the gating mechanisms and the spatial details are enhanced via the introduction of edge information. Specifically, the AGFNet employs a cross-modal adaptive gated-attention fusion (CAGF) module to aggregate the RGB and thermal features, where we give a sufficient exploration of the complementarity between the two-modal features via the gated attention unit (GAU). Particularly, in GAU, the gates can be used to purify the features, and the channel and spatial attention mechanisms are further employed to enhance the two-modal features interactively. Then, we design an edge detection (ED) module to learn the object-related edge cues, which simultaneously incorporates local detail information from low-level features and global location information from high-level features. After that, we deploy the edge guidance (EG) module to emphasize the spatial details of the fused features. Next, we deploy the contextual elevation (CE) module to enrich the contextual information of features by iteratively introducing the sine and cosine functions. Finally, considering that the quality of thermal images is usually lower than that of RGB images, we progressively integrate the multi-level RGB encoder features with multi-level decoder features, thereby focusing more on appearance information. Following this way, we can acquire the final high-quality segmentation result. Extensive experiments are performed on three public datasets including MFNet, PST900 and FMB datasets, and the experimental results show that our method achieves competitive performance when compared with the 22 state-of-the-art methods.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 5\",\"pages\":\"6477-6492\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10858005/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10858005/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

RGB- t语义分割可以通过结合RGB和热红外图像，有效地从具有挑战性的场景（例如，低照度和低对比度环境）中弹出对象。然而，现有的前沿RGB-T语义分割方法往往对多模态特征融合的探索不足，忽视了两种模态之间的差异。在本文中，我们提出了一种自适应门控融合网络（AGFNet）来进行RGB-T语义分割，其中通过门控机制将多模态特征结合起来，并通过引入边缘信息来增强空间细节。具体来说，AGFNet采用了一个跨模态自适应门控注意融合（CAGF）模块来聚合RGB和热特征，其中我们通过门控注意单元（GAU）对两模态特征之间的互补性进行了充分的探索。特别是在GAU中，可以使用栅极来净化特征，并进一步使用通道和空间注意机制来交互增强双模态特征。然后，我们设计了一个边缘检测模块来学习与物体相关的边缘线索，该模块同时融合了来自低级特征的局部细节信息和来自高级特征的全局位置信息。然后，我们部署边缘引导（EG）模块来强调融合特征的空间细节。接下来，我们部署上下文提升（CE）模块，通过迭代引入正弦和余弦函数来丰富特征的上下文信息。最后，考虑到热图像的质量通常低于RGB图像，我们将多级RGB编码器特征与多级解码器特征逐步融合，从而更加关注外观信息。通过这种方法，我们可以得到最终的高质量分割结果。在MFNet、PST900和FMB三个公共数据集上进行了大量的实验，实验结果表明，与22种最先进的方法相比，我们的方法具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AGFNet: Adaptive Gated Fusion Network for RGB-T Semantic Segmentation

RGB-T semantic segmentation can effectively pop-out objects from challenging scenarios (e.g., low illumination and low contrast environments) by combining RGB and thermal infrared images. However, the existing cutting-edge RGB-T semantic segmentation methods often present insufficient exploration of multi-modal feature fusion, where they overlook the differences between the two modalities. In this paper, we propose an adaptive gated fusion network (AGFNet) to conduct RGB-T semantic segmentation, where the multi-modal features are combined via the gating mechanisms and the spatial details are enhanced via the introduction of edge information. Specifically, the AGFNet employs a cross-modal adaptive gated-attention fusion (CAGF) module to aggregate the RGB and thermal features, where we give a sufficient exploration of the complementarity between the two-modal features via the gated attention unit (GAU). Particularly, in GAU, the gates can be used to purify the features, and the channel and spatial attention mechanisms are further employed to enhance the two-modal features interactively. Then, we design an edge detection (ED) module to learn the object-related edge cues, which simultaneously incorporates local detail information from low-level features and global location information from high-level features. After that, we deploy the edge guidance (EG) module to emphasize the spatial details of the fused features. Next, we deploy the contextual elevation (CE) module to enrich the contextual information of features by iteratively introducing the sine and cosine functions. Finally, considering that the quality of thermal images is usually lower than that of RGB images, we progressively integrate the multi-level RGB encoder features with multi-level decoder features, thereby focusing more on appearance information. Following this way, we can acquire the final high-quality segmentation result. Extensive experiments are performed on three public datasets including MFNet, PST900 and FMB datasets, and the experimental results show that our method achieves competitive performance when compared with the 22 state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.