Information Fusion最新文献

筛选
英文 中文
When explainable artificial intelligence meets data governance: Enhancing trustworthiness in multimodal gas classification 当可解释的人工智能与数据治理相结合:提高多模式气体分类的可信度
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-25 DOI: 10.1016/j.inffus.2025.103440
Sapdo Utomo , Ayush Pratap , Periyasami Karthikeyan , John Ayeelyan , Hsiu-Chun Hsu , Pao-Ann Hsiung
{"title":"When explainable artificial intelligence meets data governance: Enhancing trustworthiness in multimodal gas classification","authors":"Sapdo Utomo ,&nbsp;Ayush Pratap ,&nbsp;Periyasami Karthikeyan ,&nbsp;John Ayeelyan ,&nbsp;Hsiu-Chun Hsu ,&nbsp;Pao-Ann Hsiung","doi":"10.1016/j.inffus.2025.103440","DOIUrl":"10.1016/j.inffus.2025.103440","url":null,"abstract":"<div><div>In the domain of artificial intelligence, the incorporation of multimodal data has become increasingly popular as a method to improve the performance of models by offering a more comprehensive range of information for learning. The gas classification, which is of utmost importance in multiple fields like industry, security, and healthcare, has received considerable interest. Nevertheless, a review of current research indicates that numerous studies suffer from inadequate data governance, resulting in subpar performance despite how complicated their proposed methodologies are. Although explainable artificial intelligence (XAI) techniques are gaining recognition for their ability to assist researchers in analyzing and enhancing model performance, their use in multimodal gas classification is still limited. This research presents a method that integrates strong data governance practices with XAI to improve the accuracy of models in classifying different types of gases using multimodal input. Our approach enhances data quality and offers a very efficient model architecture with a minimal parameter number of 0.8 million. The proposed model attains testing accuracies of 98.49% for the sensor modality, 96.48% for the image modality, and 99.4% for the fusion modality, surpassing the maximum accuracy achieved by existing state-of-the-art models, which stands at 99.2% with 106 million parameters. Notably, our model is 132.5 times smaller than the most accurate model currently used in multimodal gas classification studies. The proposed model’s robustness and trustworthiness are confirmed by extensive testing. The results demonstrate that our approach makes a significant contribution to the field of multimodal classification.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103440"},"PeriodicalIF":14.7,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144514401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset-aware Utopia modality contribution for imbalanced multimodal learning 数据感知乌托邦模式对不平衡多模式学习的贡献
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-24 DOI: 10.1016/j.inffus.2025.103383
Ying Zhou , Xuefeng Liang , Yue Xu , Xiuyun Lin
{"title":"Dataset-aware Utopia modality contribution for imbalanced multimodal learning","authors":"Ying Zhou ,&nbsp;Xuefeng Liang ,&nbsp;Yue Xu ,&nbsp;Xiuyun Lin","doi":"10.1016/j.inffus.2025.103383","DOIUrl":"10.1016/j.inffus.2025.103383","url":null,"abstract":"<div><div>Multimodal imbalance is a critical issue in multimodal learning research. Over recent years, numerous modulation strategies have been proposed, with a core focus on minimizing disparities in contributions across modalities. However, we observe that in many datasets, the contribution proportions of modalities are inherently unequal. Consequently, diagnosing multimodal imbalanced learning based on the criterion of equal contributions and optimizing models by minimizing modality contribution disparities often result in suboptimal performance. To address this issue, we propose the concept of “Utopia Contribution”, which estimates the utopia contribution distribution of each modality based on dataset-specific characteristics. This distribution serves as the optimization objective for modulation strategies, facilitating the comprehensive exploitation of information from all modalities. Specifically, based on the principle of population risk, we estimate the utopia contribution distribution of modalities in the given dataset by analyzing the impact of each modality’s presence or absence on model predictions. Additionally, to enhance the generalizability of our method, we further propose a model-agnostic approach based on mutual information to estimate the factual contribution distribution of each modality. During training, we employ KL divergence (Kullback–Leibler divergence) to align the factual contribution distribution with the utopia contribution distribution. Extensive experiments on three benchmark datasets - IEMOCAP, CMU-MOSEI, and AVE - demonstrate the rationality, reliability, and effectiveness of our method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103383"},"PeriodicalIF":14.7,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FERmc: Facial expression recognition framework based on multi-branch fusion and depthwise separable convolution 基于多分支融合和深度可分离卷积的人脸表情识别框架
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-24 DOI: 10.1016/j.inffus.2025.103416
Jiquan Li , Zhiquan Liu , Wang Zhou , Amin Ul Haq , Abdus Saboor
{"title":"FERmc: Facial expression recognition framework based on multi-branch fusion and depthwise separable convolution","authors":"Jiquan Li ,&nbsp;Zhiquan Liu ,&nbsp;Wang Zhou ,&nbsp;Amin Ul Haq ,&nbsp;Abdus Saboor","doi":"10.1016/j.inffus.2025.103416","DOIUrl":"10.1016/j.inffus.2025.103416","url":null,"abstract":"<div><div>Facial expressions are often expressed through specific key areas of the face. Currently, facial expression recognition has garnered widespread attention and become a trending research topic in human–computer interaction, healthcare, and virtual reality. However, the inter-class similarity degree is rather high between different expressions, which may lead to difficulties in feature extraction and high computational complexity in classification. In this article, we introduce a new Facial Expression Recognition framework named FERmc, which is based on Multi-branch Fusion and Depthwise Separable Convolution. Specifically, we design a neural network with multiple convolutional branches to adaptively capture features with different scales from the images. By employing attention modules, the designed network can automatically focus on the most discriminative local regions on the images, thereby improving the robustness of feature representation. Moreover, FERmc can enhance the efficiency and effectiveness with the attention mechanism in feature extraction. To evaluate the performance of FERmc, extensive experiments are conducted over three facial expression recognition datasets. The performance analysis indicates that FERmc can achieve high performance and significantly outperform other benchmark algorithms, demonstrating the superiority and effectiveness of FERmc in recognition tasks for facial expression.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103416"},"PeriodicalIF":14.7,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144472022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective 大语言模型与视觉感知的有效整合:训练范式视角的研究
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-24 DOI: 10.1016/j.inffus.2025.103419
Xiaorui Ma, Haoran Xie, S. Joe Qin
{"title":"Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective","authors":"Xiaorui Ma,&nbsp;Haoran Xie,&nbsp;S. Joe Qin","doi":"10.1016/j.inffus.2025.103419","DOIUrl":"10.1016/j.inffus.2025.103419","url":null,"abstract":"<div><div>Integrating Large Language Models (LLMs) with visual modalities has become a central focus in multimodal AI. However, the high computational cost associated with Vision Large Language Models (VLLMs) limits their accessibility, restricting broader use across research communities and real-world deployments. Based on a comprehensive review of 36 high-quality image-text VLLMs, this survey categorizes vision integration into three training paradigms, each employing distinct approaches to improve parameter efficiency. Single-stage Tuning combines pretraining with few-shot learning and achieves strong generalization using minimal labeled data by training only the Modality Integrator (MI). Two-stage Tuning enhances performance through instruction tuning, multi-task learning, or reinforcement learning while improving efficiency via selective MI training, reparameterization modules, and lightweight LLMs. Direct Adaptation skips pretraining and directly finetunes the model on vision-language tasks, achieving efficiency by embedding lightweight MIs into frozen LLMs. These training paradigms have enabled practical applications in areas such as visual assistance, mobile device deployment, medical analysis, agricultural monitoring, and autonomous driving under resource constraints. Despite these advances, each paradigm faces distinct limitations: Single-stage Tuning struggles with few-shot transfer, Two-stage Tuning remains computationally expensive, and Direct Adaptation shows limited generalization ability. Correspondingly, future progress will require more effective pretraining strategies for better few-shot transfer in Single-stage Tuning, optimized use of lightweight LLMs in Two-stage Tuning, and broader adoption of instruction tuning in Direct Adaptation to improve generalization under resource constraints.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103419"},"PeriodicalIF":14.7,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Category Fusion Contrastive Learning with Core Data Selection for Robust RGB Image-based Dental Caries Classification 基于核心数据选择的多类别融合对比学习鲁棒RGB图像龋齿分类
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-23 DOI: 10.1016/j.inffus.2025.103390
Peiliang Zhang , Yaru Chen , Yunjiong Liu , Chao Che , Yongjun Zhu
{"title":"Multi-Category Fusion Contrastive Learning with Core Data Selection for Robust RGB Image-based Dental Caries Classification","authors":"Peiliang Zhang ,&nbsp;Yaru Chen ,&nbsp;Yunjiong Liu ,&nbsp;Chao Che ,&nbsp;Yongjun Zhu","doi":"10.1016/j.inffus.2025.103390","DOIUrl":"10.1016/j.inffus.2025.103390","url":null,"abstract":"<div><div>Dental caries represents one of the most prevalent diseases affecting humankind, particularly among adolescent populations. RGB images offer a convenient and cost-effective method for dental caries detection. However, the image data captured may suffer from blurriness, which, together with label errors introduced during manual annotations, can degrade the performance of the model learned for dental caries detection. To address this problem, we propose the Multi-Category Fusion Contrastive Learning with Core Data Selection (M3C) to improve the predictive performance of dental caries classification models. Instead of fine-tuning the backbone network structure, M3C focuses on improving the robustness of model to label errors from a novel perspective by identifying core data that is highly relevant to the dental caries category. We analyzed and validated that M3C has better robustness in dental caries detection from model architecture representation, theoretical analysis, and mutual information computation. Specifically, M3C quantifies the average mutual information between dental caries images and dental caries category centers based on Jensen-Shannon Divergence (JSD), which is then used for selecting the core data to mitigate the impact of label errors on model performance. Furthermore, we design inter-category contrastive learning to enhance the performance of the model in distinguishing the categories of dental caries by improving the feature representation for samples of different categories. With theoretical justification, we jointly optimized model training using prediction loss and confusion contrastive loss. Extensive experiments demonstrate that M3C significantly surpasses comparative data selection methods in dental caries detection on dental caries RGB image datasets. More excitingly, M3C achieves superior predictive performance using only 50% of the core data compared to state-of-the-art dental caries detection methods using the entire dataset. Our code is available at: <span><span>https://github.com/papercodeforreview/Caries_Detection_Journal</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103390"},"PeriodicalIF":14.7,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144480795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metric learning-enhanced semi-supervised Graph Convolutional Network for multi-view learning 多视图学习的度量学习增强半监督图卷积网络
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-23 DOI: 10.1016/j.inffus.2025.103420
Huaiyuan Xiao , Fadi Dornaika , Jinan Charafeddine , Jingjun Bi
{"title":"Metric learning-enhanced semi-supervised Graph Convolutional Network for multi-view learning","authors":"Huaiyuan Xiao ,&nbsp;Fadi Dornaika ,&nbsp;Jinan Charafeddine ,&nbsp;Jingjun Bi","doi":"10.1016/j.inffus.2025.103420","DOIUrl":"10.1016/j.inffus.2025.103420","url":null,"abstract":"<div><div>Multi-view learning utilizes data from diverse perspectives or modalities, integrating complementary information from various sources. It plays a crucial role in intelligent systems and finds extensive applications in fields such as computer vision, recommender systems, and natural language processing. With the increasing complexity and heterogeneity of real-world data, the integration of Graph Convolutional Networks (GCNs) in multi-view learning scenarios is becoming increasingly important. Despite the advances in GCNs, it remains a major challenge to effectively generalize models and improve their stability across different data views. In this paper, we present a novel framework, the Enhanced Triplet Loss Based Semi-Supervised Graph Convolutional Network for Multi-View Learning (MV-TriGCN), which addresses these challenges through three primary innovations. First, we propose an enhanced triplet loss for deep metric learning tailored to the hidden features of GCNs based on semi-hard negative sample selection. Second, view graphs are constructed using the classical KNN scheme and a semi-supervised flexible method to improve the diversity of data structure representation resulting in a more stable hypothesis space. Moreover, we learn an end-to-end multi-view GCN by merging all available graphs and utilizing the aggregated cross-entropy losses and deep metric losses. Finally, we introduce a stepwise training strategy that allows the model to adapt to the losses during different optimization phases. Extensive experiments show that our method outperforms existing state-of-the-art approaches in terms of accuracy and stability.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103420"},"PeriodicalIF":14.7,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144471836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate forecasting on few-shot learning with a novel inference foundation model 基于新型推理基础模型的少镜头学习的准确预测
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-23 DOI: 10.1016/j.inffus.2025.103370
Peng-Cheng Li, Yan-Wu Wang, Jiang-Wen Xiao
{"title":"Accurate forecasting on few-shot learning with a novel inference foundation model","authors":"Peng-Cheng Li,&nbsp;Yan-Wu Wang,&nbsp;Jiang-Wen Xiao","doi":"10.1016/j.inffus.2025.103370","DOIUrl":"10.1016/j.inffus.2025.103370","url":null,"abstract":"<div><div>Accurate forecasting with limited data remains a significant challenge, especially for deep learning models that require large-scale training data to map historical data to future data. While Meta-learning (MeL) and Transfer Learning (TL) are useful, they have limitations: MeL assumes shared task structures, which may not apply to unique tasks, and TL requires domain similarity, often failing when distributions differ. Importantly, this paper reveals that future trend changes are often embedded in historical data, regardless of dataset size. However, deep learning models struggle to learn these trends from small training datasets due to their reliance on extensive historical information for mapping past to future. To address this gap, a novel inference foundation model is designed to uncover intrinsic change patterns within the data rather than relying on extensive historical information. Inspired by gene evolution, our approach decomposes historical data into subsequences (genes), selects optimal genes, and combines them into evolutionary chains based on temporal relationships. Each chain represents potential future trends. Through five generations of selection and recombination, the best gene sequence is identified for forecasting. The proposed model outperforms all state-of-the-art models across three experiments involving eight datasets. Specifically, it achieves a 27% improvement over the best-performing MeL-based and TL-based models. Furthermore, it shows an average improvement of 38% over other leading models, including Transformer-based, Multiscale-based, Linear-based, MLP-based, and Convolution-based models.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103370"},"PeriodicalIF":14.7,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144471835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-task optimization with multi-dimensional feature interaction for influencer recommendation 基于多维特征交互的网红推荐双任务优化
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-23 DOI: 10.1016/j.inffus.2025.103413
Jinbao Song , Xingyu Zhang , Di Huang , Wenwen Yang
{"title":"Dual-task optimization with multi-dimensional feature interaction for influencer recommendation","authors":"Jinbao Song ,&nbsp;Xingyu Zhang ,&nbsp;Di Huang ,&nbsp;Wenwen Yang","doi":"10.1016/j.inffus.2025.103413","DOIUrl":"10.1016/j.inffus.2025.103413","url":null,"abstract":"<div><div>Influencer marketing has emerged as a critical strategy for brands to enhance audience engagement, yet existing recommendation systems often fail to effectively integrate multi-modal features or model complex interactions between brands and influencers. To address these limitations, this paper introduces MFI-IR, a dual-task optimization framework designed to enhance influencer recommendation through multi-dimensional feature interaction. The framework integrates multi-dimensional feature interactions across four key dimensions: cross-modal topic distributions, visual styles, industry labels, and sentiment orientations. By combining explicit polynomial feature interactions with implicit high-order relation mining, MFI-IR dynamically models both shallow and deep feature correlations. A dual-task optimization strategy is designed to jointly minimize matching loss and ranking loss, balancing recommendation accuracy and stability. Experimental results on a publicly available Instagram dataset demonstrate significant performance improvements, achieving an AUC of 0.9371 (6% higher than the best baseline) and a MAP of 0.9079 (3.8<span><math><mo>×</mo></math></span> improvement). The key innovations of this work include: (1) a holistic feature fusion approach that eliminates reliance on single-modality representations by unifying topic, visual, industry, and sentiment features; (2) a hybrid interaction architecture that captures both explicit and implicit feature relationships; and (3) a dual-objective learning mechanism that optimizes matching and ranking tasks simultaneously.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103413"},"PeriodicalIF":14.7,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144514309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association-based concept-cognitive learning for classification: Fusing knowledge with distance metric learning 基于关联的分类概念认知学习:融合知识与距离度量学习
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-22 DOI: 10.1016/j.inffus.2025.103386
Chengling Zhang , Guangming Xue , Weihua Xu , Huilai Zhi , Yinfeng Zhou , Eric C.C. Tsang
{"title":"Association-based concept-cognitive learning for classification: Fusing knowledge with distance metric learning","authors":"Chengling Zhang ,&nbsp;Guangming Xue ,&nbsp;Weihua Xu ,&nbsp;Huilai Zhi ,&nbsp;Yinfeng Zhou ,&nbsp;Eric C.C. Tsang","doi":"10.1016/j.inffus.2025.103386","DOIUrl":"10.1016/j.inffus.2025.103386","url":null,"abstract":"<div><div>Concept-cognitive learning, which emphasizes the representation and learning of knowledge incorporated within data, has yielded excellent results in classification research. However, learning concepts from a high-dimensional dataset is a time-consuming and complex process, which increases the extraction of redundant information and leads to poor classification task. Most existing neighborhood concept generated by neighborhood similarity granule use a single predefined distance function and ignore the decision labels, which lead to the fact that the learned distance function is not optimal. Moreover, current concept-cognitive learning methods do not fully utilize the advantages of granular concept and neighborhood concept, resulting in weak interpretability. To address these issues, we introduce a novel association-based concept-cognitive learning method with distance metric learning for knowledge fusion and concept classification. To be concrete, to decrease the dimensionality of dataset and remove the interfering information, the representative attribute set from attribute clusters based on correlation coefficient matrix is firstly discussed. Subsequently, neighborhood similarity granules based on distance metric learning are used to construct fuzzy concepts. To obtain fuzzy concept of maximum contribution, we present a valid fuzzy concept associative space related to clues in the human brain. Furthermore, a mechanism of fuzzy concept-cognitive associative learning with distance metric learning (FCADML) model is proposed, which aims to achieve concept clustering and class prediction by fusing objects and attributes within fuzzy concepts. Finally, we perform a classification performance evaluation on thirteen datasets which verify that the feasibility and efficiency of the proposed learning mechanism.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103386"},"PeriodicalIF":14.7,"publicationDate":"2025-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144472021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FusionSegNet: A Hierarchical Multi-Axis Attention and gated feature fusion network for breast lesion segmentation with uncertainty modeling in ultrasound imaging FusionSegNet:一种分层多轴关注和门控特征融合网络,用于超声成像中不确定性建模的乳腺病变分割
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-21 DOI: 10.1016/j.inffus.2025.103399
Md Rayhan Ahmed, Patricia Lasserre
{"title":"FusionSegNet: A Hierarchical Multi-Axis Attention and gated feature fusion network for breast lesion segmentation with uncertainty modeling in ultrasound imaging","authors":"Md Rayhan Ahmed,&nbsp;Patricia Lasserre","doi":"10.1016/j.inffus.2025.103399","DOIUrl":"10.1016/j.inffus.2025.103399","url":null,"abstract":"<div><div>Lesion segmentation in breast ultrasound images (BUS) is challenging due to noise, low contrast appearance, ambiguous boundaries, texture inconsistencies, and inherent uncertainty in lesion appearance. These challenges are further exacerbated by the semantic gap between encoder and decoder features in U-Net-based models. In this paper, we introduce FusionSegNet, a novel lesion segmentation network that integrates several key innovations to address these challenges. First, we propose a Fuzzy Logic-Based Multi-Scale Contextual Network as the encoder to handle noisy and uncertain areas through multi-scale attention and fuzzy membership-based uncertainty estimation. Second, we design a Weighted Multiplicative Fusion Module to effectively merge multi-scale features while suppressing noise. Third, we integrate Hierarchical Multi-Axis Attention in both the encoder and decoder to enhance focus across multiple dimensions, enabling FusionSegNet to better segment targets with varypositions, scalesscalessizesd sizes. Fourth, we introduce a Gated Multi-Scale Feature Aggregation Module that bridges both local and global information for better semantic understanding, and the newly integrated Atrous Attention Fusion Module further refines multi-scale long-range contextual details using different dilation rates. Finally, we design a Gated Multi-Scale Fusion Block which facilitates feature fusion between the encoder and decoder to maintain spatial consistency. Extensive experiments and a comprehensive ablation study on two benchmark BUS datasets validate the superiority of FusionSegNet and its integrated design choices over state-of-the-art methods. FusionSegNet achieves an mDSC of 93.22% on the UDIAT dataset and an mIoU of 80.10% on the BUSI dataset, establishing a new benchmark for lesion segmentation in BUS images. Our code can be found at <span><span>https://github.com/rayhan-ahmed91/FusionSegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103399"},"PeriodicalIF":14.7,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信