CSFAFormer:用于多模态遥感图像语义分割的分类选择性特征聚合转换器

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yue Ni , Donglin Xue , Weijian Chi , Ji Luan , Jiahang Liu
{"title":"CSFAFormer:用于多模态遥感图像语义分割的分类选择性特征聚合转换器","authors":"Yue Ni ,&nbsp;Donglin Xue ,&nbsp;Weijian Chi ,&nbsp;Ji Luan ,&nbsp;Jiahang Liu","doi":"10.1016/j.inffus.2025.103786","DOIUrl":null,"url":null,"abstract":"<div><div>Feature fusion is one of the keys to multimodal data segmentation. Different fusion mechanisms vary significantly in how effectively they utilize inter-modal features, exploit complementary information, and enhance representations, while also greatly affecting model parameters and computational complexity. Cross-attention fusion mechanism (CAFM) is the most widely used feature fusion mechanism in the current multimodal fusion classification task, but due to the inherent limitation, it cannot adapt to the differentiated feature requirements of different classes and leads to the blurring of interclass and dispersal features of intraclass. To address these challenges, a novel Category-Selective Feature Aggregation Transformer (CSFAFormer) is proposed to dynamically adjust the interaction weights between modalities along the class dimension, thereby fully leveraging the complementary advantages of different modalities. To accommodate the differentiated needs of different categories, a Category Cross-Calibration Mechanism (C<sup>3</sup>M) is designed to compress multi-channel features, estimate pixel-level class distributions, and employ a confidence-based cross-calibration strategy to dynamically adjust interaction weights along the class dimension, better accommodating the varying demands of different classes. To further semantic consistency and inter-class separability, a Category-Selective Transformer Module is proposed to leverage the class information calibrated by C<sup>3</sup>M for adaptive weighted fusion along the class dimension, thereby optimizing the representation of category-specific features. Experimental results indicate that CSFAFormer significantly outperforms in segmentation performance. Compared to the CAFM, CSFAFormer reduces the parameter count by 38.5 % and the computational cost by 72.3 %, while maintaining superior performance. The code is available at: <span><span>https://github.com/NUAALISILab/CSFAFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103786"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CSFAFormer: Category-selective feature aggregation transformer for multimodal remote sensing image semantic segmentation\",\"authors\":\"Yue Ni ,&nbsp;Donglin Xue ,&nbsp;Weijian Chi ,&nbsp;Ji Luan ,&nbsp;Jiahang Liu\",\"doi\":\"10.1016/j.inffus.2025.103786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Feature fusion is one of the keys to multimodal data segmentation. Different fusion mechanisms vary significantly in how effectively they utilize inter-modal features, exploit complementary information, and enhance representations, while also greatly affecting model parameters and computational complexity. Cross-attention fusion mechanism (CAFM) is the most widely used feature fusion mechanism in the current multimodal fusion classification task, but due to the inherent limitation, it cannot adapt to the differentiated feature requirements of different classes and leads to the blurring of interclass and dispersal features of intraclass. To address these challenges, a novel Category-Selective Feature Aggregation Transformer (CSFAFormer) is proposed to dynamically adjust the interaction weights between modalities along the class dimension, thereby fully leveraging the complementary advantages of different modalities. To accommodate the differentiated needs of different categories, a Category Cross-Calibration Mechanism (C<sup>3</sup>M) is designed to compress multi-channel features, estimate pixel-level class distributions, and employ a confidence-based cross-calibration strategy to dynamically adjust interaction weights along the class dimension, better accommodating the varying demands of different classes. To further semantic consistency and inter-class separability, a Category-Selective Transformer Module is proposed to leverage the class information calibrated by C<sup>3</sup>M for adaptive weighted fusion along the class dimension, thereby optimizing the representation of category-specific features. Experimental results indicate that CSFAFormer significantly outperforms in segmentation performance. Compared to the CAFM, CSFAFormer reduces the parameter count by 38.5 % and the computational cost by 72.3 %, while maintaining superior performance. The code is available at: <span><span>https://github.com/NUAALISILab/CSFAFormer</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103786\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008486\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008486","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

特征融合是多模态数据分割的关键之一。不同的融合机制在如何有效地利用多模式特征、利用互补信息和增强表征方面差异很大,同时也极大地影响了模型参数和计算复杂度。交叉注意融合机制(Cross-attention fusion mechanism, CAFM)是目前多模态融合分类任务中应用最广泛的特征融合机制,但由于其固有的局限性,无法适应不同类别对特征的差异化需求,导致类间特征的模糊和类内特征的分散。为了解决这些问题,提出了一种新的类别选择特征聚合转换器(Category-Selective Feature Aggregation Transformer, CSFAFormer),它可以沿着类维动态调整模式之间的交互权重,从而充分发挥不同模式的互补优势。为了适应不同类别的差异化需求,设计了类别交叉校准机制(C3M),压缩多通道特征,估计像素级类别分布,并采用基于置信度的交叉校准策略沿类别维度动态调整交互权重,更好地适应不同类别的不同需求。为了进一步提高语义一致性和类间可分离性,提出了一种类别选择转换器模块,利用C3M校准的类别信息沿着类维进行自适应加权融合,从而优化类别特定特征的表示。实验结果表明,CSFAFormer在分割性能上有明显的优势。与CAFM相比,CSFAFormer在保持优越性能的同时,减少了38.5%的参数计数和72.3%的计算成本。代码可从https://github.com/NUAALISILab/CSFAFormer获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CSFAFormer: Category-selective feature aggregation transformer for multimodal remote sensing image semantic segmentation
Feature fusion is one of the keys to multimodal data segmentation. Different fusion mechanisms vary significantly in how effectively they utilize inter-modal features, exploit complementary information, and enhance representations, while also greatly affecting model parameters and computational complexity. Cross-attention fusion mechanism (CAFM) is the most widely used feature fusion mechanism in the current multimodal fusion classification task, but due to the inherent limitation, it cannot adapt to the differentiated feature requirements of different classes and leads to the blurring of interclass and dispersal features of intraclass. To address these challenges, a novel Category-Selective Feature Aggregation Transformer (CSFAFormer) is proposed to dynamically adjust the interaction weights between modalities along the class dimension, thereby fully leveraging the complementary advantages of different modalities. To accommodate the differentiated needs of different categories, a Category Cross-Calibration Mechanism (C3M) is designed to compress multi-channel features, estimate pixel-level class distributions, and employ a confidence-based cross-calibration strategy to dynamically adjust interaction weights along the class dimension, better accommodating the varying demands of different classes. To further semantic consistency and inter-class separability, a Category-Selective Transformer Module is proposed to leverage the class information calibrated by C3M for adaptive weighted fusion along the class dimension, thereby optimizing the representation of category-specific features. Experimental results indicate that CSFAFormer significantly outperforms in segmentation performance. Compared to the CAFM, CSFAFormer reduces the parameter count by 38.5 % and the computational cost by 72.3 %, while maintaining superior performance. The code is available at: https://github.com/NUAALISILab/CSFAFormer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信