Xifeng Hu , Yankun Cao , Weifeng Hu , Shanshan Wang , Subhas Chandra Mukhopadhyay , Yu Liu , Huafeng Li , Yujun Li , Qing Cai , Zhi Liu
{"title":"Expert knowledge-guided multi-granularity multi-scale fusion for weakly-supervised histological segmentation","authors":"Xifeng Hu , Yankun Cao , Weifeng Hu , Shanshan Wang , Subhas Chandra Mukhopadhyay , Yu Liu , Huafeng Li , Yujun Li , Qing Cai , Zhi Liu","doi":"10.1016/j.inffus.2025.103432","DOIUrl":null,"url":null,"abstract":"<div><div>Medical image fusion plays a crucial role in enhancing weakly-supervised segmentation performance while alleviating the over-reliance on dense annotations. However, existing methods tend to integrate unfiltered textual data with visual features, resulting in semantic redundancy and ambiguity that ultimately impair visual-textual alignment . Furthermore, they often rely on single-scale fusion schemes, which can lead to the loss of critical semantic information. To address these challenges, we propose an Expert Knowledge-Guided Multi-granularity Multi-scale fusion framework for weakly-supervised histological segmentation, which leverages fine-grained text representations and multi-scale label-pixel fusion and alignment to suppress redundancy and strengthen supervision. Specifically, to address the interference of redundant text on homogeneous pixels, we start by constructing an expert knowledge-guided fine-grained text representation paradigm, progressively refining and extracting key information to uncover the multi-level clues and semantic features of the images. To effectively represent and convey fine-grained guidance knowledge, a multi-scale label-pixel fusion and alignment module is proposed, which emphasizes the fusion and interaction between fine-grained text prompts and image features, enhancing category sensitivity. Additionally, a visual state-space adaptive layer is embedded into a multi-stage pre-trained transformer encoder to improve long-range dependency modeling with low computational cost. The experiments on public datasets demonstrate the effectiveness of the proposed method. Our approach outperforms current methods in both quantitative and qualitative evaluations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103432"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525005056","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Medical image fusion plays a crucial role in enhancing weakly-supervised segmentation performance while alleviating the over-reliance on dense annotations. However, existing methods tend to integrate unfiltered textual data with visual features, resulting in semantic redundancy and ambiguity that ultimately impair visual-textual alignment . Furthermore, they often rely on single-scale fusion schemes, which can lead to the loss of critical semantic information. To address these challenges, we propose an Expert Knowledge-Guided Multi-granularity Multi-scale fusion framework for weakly-supervised histological segmentation, which leverages fine-grained text representations and multi-scale label-pixel fusion and alignment to suppress redundancy and strengthen supervision. Specifically, to address the interference of redundant text on homogeneous pixels, we start by constructing an expert knowledge-guided fine-grained text representation paradigm, progressively refining and extracting key information to uncover the multi-level clues and semantic features of the images. To effectively represent and convey fine-grained guidance knowledge, a multi-scale label-pixel fusion and alignment module is proposed, which emphasizes the fusion and interaction between fine-grained text prompts and image features, enhancing category sensitivity. Additionally, a visual state-space adaptive layer is embedded into a multi-stage pre-trained transformer encoder to improve long-range dependency modeling with low computational cost. The experiments on public datasets demonstrate the effectiveness of the proposed method. Our approach outperforms current methods in both quantitative and qualitative evaluations.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.