NTFNet: RGB-TIR语义分割的先窄后融合网络

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-03-10 DOI:10.1007/s10489-025-06411-7

Yichen Liu, Junjie Ye, Wangpeng He, Zhiqiang Qu, Ruoxuan Xu

{"title":"NTFNet: RGB-TIR语义分割的先窄后融合网络","authors":"Yichen Liu, Junjie Ye, Wangpeng He, Zhiqiang Qu, Ruoxuan Xu","doi":"10.1007/s10489-025-06411-7","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, the task of understanding scenes in visible (RGB) and thermal-infrared (TIR) images has garnered increasing interest in the field of computer vision. However, most existing methods employ simplistic fusion strategies to merge features from different modalities. These strategies often overlook the differences in shallow-level features between modalities, thereby reducing the discriminability of the fused features and resulting in suboptimal segmentation performance. To address this issue, we present a novel RGB-TIR semantic segmentation framework, named NTFNet. This framework exploits the potential consistency of semantic-level features to rectify shallow-level features and reduce discrepancies between modalities prior to integration. Specifically, auxiliary encoders are employed at each layer to capture semantically consistent information. To obtain rich multi-modal semantic features, we designed a High-Level Feature Fusion Module (HFFM) that enhances feature representation in both channel and spatial dimensions. Subsequently, the Shallow Feature Difference Rectification Module (SFDRM) is introduced to rectify the difference in shallow-level features. To address the loss of detailed information during the rectification process, the SFDRM incorporates a Detail Attention Mechanism (DAM) to preserve the original detail information, thereby further optimizing the final segmentation results. In the end, a Multi-Scale Feature Fusion module (Multi-Scale FFM) is designed to combine the rectified features. Comprehensive experiments on two public RGB-TIR datasets show that our method significantly outperforms other state-of-the-art approaches in terms of performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06411-7.pdf","citationCount":"0","resultStr":"{\"title\":\"NTFNet: Narrowing-Then-Fusing network for RGB-TIR semantic segmentation\",\"authors\":\"Yichen Liu, Junjie Ye, Wangpeng He, Zhiqiang Qu, Ruoxuan Xu\",\"doi\":\"10.1007/s10489-025-06411-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In recent years, the task of understanding scenes in visible (RGB) and thermal-infrared (TIR) images has garnered increasing interest in the field of computer vision. However, most existing methods employ simplistic fusion strategies to merge features from different modalities. These strategies often overlook the differences in shallow-level features between modalities, thereby reducing the discriminability of the fused features and resulting in suboptimal segmentation performance. To address this issue, we present a novel RGB-TIR semantic segmentation framework, named NTFNet. This framework exploits the potential consistency of semantic-level features to rectify shallow-level features and reduce discrepancies between modalities prior to integration. Specifically, auxiliary encoders are employed at each layer to capture semantically consistent information. To obtain rich multi-modal semantic features, we designed a High-Level Feature Fusion Module (HFFM) that enhances feature representation in both channel and spatial dimensions. Subsequently, the Shallow Feature Difference Rectification Module (SFDRM) is introduced to rectify the difference in shallow-level features. To address the loss of detailed information during the rectification process, the SFDRM incorporates a Detail Attention Mechanism (DAM) to preserve the original detail information, thereby further optimizing the final segmentation results. In the end, a Multi-Scale Feature Fusion module (Multi-Scale FFM) is designed to combine the rectified features. Comprehensive experiments on two public RGB-TIR datasets show that our method significantly outperforms other state-of-the-art approaches in terms of performance.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 6\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10489-025-06411-7.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-025-06411-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06411-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，理解可见光（RGB）和热红外（TIR）图像中的场景的任务在计算机视觉领域引起了越来越多的兴趣。然而，大多数现有方法采用简单的融合策略来合并来自不同模态的特征。这些策略往往忽略了模式之间浅层特征的差异，从而降低了融合特征的可辨别性，导致分割性能不理想。为了解决这个问题，我们提出了一个新的RGB-TIR语义分割框架，命名为NTFNet。该框架利用语义级特征的潜在一致性来纠正浅层特征，并在集成之前减少模式之间的差异。具体地说，辅助编码器在每一层被用来捕获语义一致的信息。为了获得丰富的多模态语义特征，我们设计了一个高级特征融合模块（High-Level Feature Fusion Module， HFFM），从通道和空间两个维度增强特征表示。随后，引入浅层特征差校正模块（SFDRM）对浅层特征差进行校正。为了解决校正过程中细节信息丢失的问题，SFDRM引入了细节注意机制（DAM）来保留原始的细节信息，从而进一步优化最终的分割结果。最后，设计了多尺度特征融合模块（Multi-Scale FFM），将校正后的特征融合在一起。在两个公共RGB-TIR数据集上进行的综合实验表明，我们的方法在性能方面明显优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NTFNet: Narrowing-Then-Fusing network for RGB-TIR semantic segmentation

In recent years, the task of understanding scenes in visible (RGB) and thermal-infrared (TIR) images has garnered increasing interest in the field of computer vision. However, most existing methods employ simplistic fusion strategies to merge features from different modalities. These strategies often overlook the differences in shallow-level features between modalities, thereby reducing the discriminability of the fused features and resulting in suboptimal segmentation performance. To address this issue, we present a novel RGB-TIR semantic segmentation framework, named NTFNet. This framework exploits the potential consistency of semantic-level features to rectify shallow-level features and reduce discrepancies between modalities prior to integration. Specifically, auxiliary encoders are employed at each layer to capture semantically consistent information. To obtain rich multi-modal semantic features, we designed a High-Level Feature Fusion Module (HFFM) that enhances feature representation in both channel and spatial dimensions. Subsequently, the Shallow Feature Difference Rectification Module (SFDRM) is introduced to rectify the difference in shallow-level features. To address the loss of detailed information during the rectification process, the SFDRM incorporates a Detail Attention Mechanism (DAM) to preserve the original detail information, thereby further optimizing the final segmentation results. In the end, a Multi-Scale Feature Fusion module (Multi-Scale FFM) is designed to combine the rectified features. Comprehensive experiments on two public RGB-TIR datasets show that our method significantly outperforms other state-of-the-art approaches in terms of performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.