Expert knowledge-guided multi-granularity multi-scale fusion for weakly-supervised histological segmentation

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-06-27 DOI:10.1016/j.inffus.2025.103432

Xifeng Hu , Yankun Cao , Weifeng Hu , Shanshan Wang , Subhas Chandra Mukhopadhyay , Yu Liu , Huafeng Li , Yujun Li , Qing Cai , Zhi Liu

{"title":"Expert knowledge-guided multi-granularity multi-scale fusion for weakly-supervised histological segmentation","authors":"Xifeng Hu , Yankun Cao , Weifeng Hu , Shanshan Wang , Subhas Chandra Mukhopadhyay , Yu Liu , Huafeng Li , Yujun Li , Qing Cai , Zhi Liu","doi":"10.1016/j.inffus.2025.103432","DOIUrl":null,"url":null,"abstract":"<div><div>Medical image fusion plays a crucial role in enhancing weakly-supervised segmentation performance while alleviating the over-reliance on dense annotations. However, existing methods tend to integrate unfiltered textual data with visual features, resulting in semantic redundancy and ambiguity that ultimately impair visual-textual alignment . Furthermore, they often rely on single-scale fusion schemes, which can lead to the loss of critical semantic information. To address these challenges, we propose an Expert Knowledge-Guided Multi-granularity Multi-scale fusion framework for weakly-supervised histological segmentation, which leverages fine-grained text representations and multi-scale label-pixel fusion and alignment to suppress redundancy and strengthen supervision. Specifically, to address the interference of redundant text on homogeneous pixels, we start by constructing an expert knowledge-guided fine-grained text representation paradigm, progressively refining and extracting key information to uncover the multi-level clues and semantic features of the images. To effectively represent and convey fine-grained guidance knowledge, a multi-scale label-pixel fusion and alignment module is proposed, which emphasizes the fusion and interaction between fine-grained text prompts and image features, enhancing category sensitivity. Additionally, a visual state-space adaptive layer is embedded into a multi-stage pre-trained transformer encoder to improve long-range dependency modeling with low computational cost. The experiments on public datasets demonstrate the effectiveness of the proposed method. Our approach outperforms current methods in both quantitative and qualitative evaluations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103432"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525005056","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Medical image fusion plays a crucial role in enhancing weakly-supervised segmentation performance while alleviating the over-reliance on dense annotations. However, existing methods tend to integrate unfiltered textual data with visual features, resulting in semantic redundancy and ambiguity that ultimately impair visual-textual alignment . Furthermore, they often rely on single-scale fusion schemes, which can lead to the loss of critical semantic information. To address these challenges, we propose an Expert Knowledge-Guided Multi-granularity Multi-scale fusion framework for weakly-supervised histological segmentation, which leverages fine-grained text representations and multi-scale label-pixel fusion and alignment to suppress redundancy and strengthen supervision. Specifically, to address the interference of redundant text on homogeneous pixels, we start by constructing an expert knowledge-guided fine-grained text representation paradigm, progressively refining and extracting key information to uncover the multi-level clues and semantic features of the images. To effectively represent and convey fine-grained guidance knowledge, a multi-scale label-pixel fusion and alignment module is proposed, which emphasizes the fusion and interaction between fine-grained text prompts and image features, enhancing category sensitivity. Additionally, a visual state-space adaptive layer is embedded into a multi-stage pre-trained transformer encoder to improve long-range dependency modeling with low computational cost. The experiments on public datasets demonstrate the effectiveness of the proposed method. Our approach outperforms current methods in both quantitative and qualitative evaluations.

Abstract Image

查看原文本刊更多论文

基于专家知识的弱监督组织分割多粒度多尺度融合

医学图像融合在提高弱监督分割性能和减轻对密集注释的过度依赖方面起着至关重要的作用。然而，现有的方法倾向于将未经过滤的文本数据与视觉特征相结合，从而导致语义冗余和歧义，最终损害视觉-文本对齐。此外，它们通常依赖于单尺度融合方案，这可能导致关键语义信息的丢失。为了解决这些挑战，我们提出了一个专家知识引导的多粒度多尺度融合框架，用于弱监督的组织分割，该框架利用细粒度文本表示和多尺度标签像素融合和对齐来抑制冗余并加强监督。具体来说，为了解决冗余文本对同质像素的干扰，我们首先构建了一个专家知识引导的细粒度文本表示范式，逐步提炼和提取关键信息，以揭示图像的多层次线索和语义特征。为有效表达和传达细粒度引导知识，提出了一种多尺度标签-像素融合对齐模块，强调细粒度文本提示与图像特征的融合与交互，提高了分类敏感性。此外，在多级预训练的变压器编码器中嵌入了一个可视化的状态空间自适应层，以降低计算成本改善远程依赖关系建模。在公共数据集上的实验证明了该方法的有效性。我们的方法在定量和定性评估方面都优于当前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.