基于大视觉模型的多方向融合可见光医学图像分割

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xingru Huang , Tianyun Zhang , Zhaoyang Xu , Jian Huang , Gaopeng Huang , Han Yang , Binfeng Zou , Shouqin Ding , Renjie Ruan , Zhao Huang , Huiyu Zhou , Jin Liu , Zhiwen Zheng , Shaowei Jiang , Xiaoshuai Zhang
{"title":"基于大视觉模型的多方向融合可见光医学图像分割","authors":"Xingru Huang ,&nbsp;Tianyun Zhang ,&nbsp;Zhaoyang Xu ,&nbsp;Jian Huang ,&nbsp;Gaopeng Huang ,&nbsp;Han Yang ,&nbsp;Binfeng Zou ,&nbsp;Shouqin Ding ,&nbsp;Renjie Ruan ,&nbsp;Zhao Huang ,&nbsp;Huiyu Zhou ,&nbsp;Jin Liu ,&nbsp;Zhiwen Zheng ,&nbsp;Shaowei Jiang ,&nbsp;Xiaoshuai Zhang","doi":"10.1016/j.inffus.2025.103385","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate lesion quantification represents a critical component of precision diagnostics and targeted therapeutic strategies, yet current methodologies face challenges when confronted with the diverse contextual and complicated structures inherent in visible-light medical imaging, including semantic ambiguity, noise interference, and geometric complexity, which collectively hinder segmentation accuracy. Targeting these challenges, we proposes the Multi-Aspect Large Vision Model (MasLVM), a foundational model for optical medical imaging that achieves comprehensive feature fusion across Tri-Path fusion. The Semantic Context Encoder (SCE) integrates a pre-trained large vision model with global semantic embeddings to improve contextual abstraction and mitigate semantic ambiguities. The Spectral Spline Encoder (SSE), incorporating the Multi-Frequency Feature Modulator (MFFM) and Kolmogorov–Arnold Networks (KAN) Channel Attention, transitions image representations into the frequency domain to selectively attenuate noise while preserving essential structural features. The Hierarchical Deformable Morphometry Encoder (HDME) employs deformable convolutions and multi-scale encoding to capture heterogeneous geometric structures dynamically. The outputs from these branches are synthesized through the Multi-Attention KAN Decoder, which employs KAN multiple self-attention and iterative attentional fusion to select and enhance semantic, spectral, and morphological critical domain features adaptively. Extensive experiments across six widely recognized datasets demonstrate that MasLVM achieves convincing performance compared with multiple previous state-of-the-art (SoTA) methods, and potential utility in adapting to diverse requirements of visible-light medical imaging tasks under constrained conditions. The code and model weights can be directly used for medical task deployment or fine-tuning, and are publicly available at the following link: <span><span>https://github.com/IMOP-lab/MasLVM-Pytorch</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103385"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-aspect fusion in foundational large vision model for visible light medical imaging segmentation\",\"authors\":\"Xingru Huang ,&nbsp;Tianyun Zhang ,&nbsp;Zhaoyang Xu ,&nbsp;Jian Huang ,&nbsp;Gaopeng Huang ,&nbsp;Han Yang ,&nbsp;Binfeng Zou ,&nbsp;Shouqin Ding ,&nbsp;Renjie Ruan ,&nbsp;Zhao Huang ,&nbsp;Huiyu Zhou ,&nbsp;Jin Liu ,&nbsp;Zhiwen Zheng ,&nbsp;Shaowei Jiang ,&nbsp;Xiaoshuai Zhang\",\"doi\":\"10.1016/j.inffus.2025.103385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate lesion quantification represents a critical component of precision diagnostics and targeted therapeutic strategies, yet current methodologies face challenges when confronted with the diverse contextual and complicated structures inherent in visible-light medical imaging, including semantic ambiguity, noise interference, and geometric complexity, which collectively hinder segmentation accuracy. Targeting these challenges, we proposes the Multi-Aspect Large Vision Model (MasLVM), a foundational model for optical medical imaging that achieves comprehensive feature fusion across Tri-Path fusion. The Semantic Context Encoder (SCE) integrates a pre-trained large vision model with global semantic embeddings to improve contextual abstraction and mitigate semantic ambiguities. The Spectral Spline Encoder (SSE), incorporating the Multi-Frequency Feature Modulator (MFFM) and Kolmogorov–Arnold Networks (KAN) Channel Attention, transitions image representations into the frequency domain to selectively attenuate noise while preserving essential structural features. The Hierarchical Deformable Morphometry Encoder (HDME) employs deformable convolutions and multi-scale encoding to capture heterogeneous geometric structures dynamically. The outputs from these branches are synthesized through the Multi-Attention KAN Decoder, which employs KAN multiple self-attention and iterative attentional fusion to select and enhance semantic, spectral, and morphological critical domain features adaptively. Extensive experiments across six widely recognized datasets demonstrate that MasLVM achieves convincing performance compared with multiple previous state-of-the-art (SoTA) methods, and potential utility in adapting to diverse requirements of visible-light medical imaging tasks under constrained conditions. The code and model weights can be directly used for medical task deployment or fine-tuning, and are publicly available at the following link: <span><span>https://github.com/IMOP-lab/MasLVM-Pytorch</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"125 \",\"pages\":\"Article 103385\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525004580\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004580","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

准确的病灶量化是精确诊断和靶向治疗策略的关键组成部分,然而,当面对可见光医学成像中固有的各种上下文和复杂结构时,当前的方法面临挑战,包括语义模糊、噪声干扰和几何复杂性,这些因素共同阻碍了分割的准确性。针对这些挑战,我们提出了多面向大视觉模型(Multi-Aspect Large Vision Model, MasLVM),这是光学医学成像的基础模型,可以实现跨三路径融合的全面特征融合。语义上下文编码器(Semantic Context Encoder, SCE)将预训练的大视觉模型与全局语义嵌入相结合,以提高上下文抽象和减轻语义歧义。谱样条编码器(SSE),结合多频特征调制器(MFFM)和Kolmogorov-Arnold网络(KAN)通道注意,将图像表示转换到频域,在保留基本结构特征的同时选择性地衰减噪声。层次可变形形态编码器(HDME)采用可变形卷积和多尺度编码来动态捕获异质几何结构。这些分支的输出通过多注意KAN解码器进行合成,该解码器采用KAN多重自注意和迭代注意融合自适应地选择和增强语义、谱和形态关键域特征。在六个广泛认可的数据集上进行的大量实验表明,与先前的多种最先进的(SoTA)方法相比,MasLVM取得了令人信服的性能,并且在适应受限条件下可见光医学成像任务的各种要求方面具有潜在的实用性。代码和模型权重可直接用于医疗任务部署或微调,并可从以下链接公开获取:https://github.com/IMOP-lab/MasLVM-Pytorch。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-aspect fusion in foundational large vision model for visible light medical imaging segmentation
Accurate lesion quantification represents a critical component of precision diagnostics and targeted therapeutic strategies, yet current methodologies face challenges when confronted with the diverse contextual and complicated structures inherent in visible-light medical imaging, including semantic ambiguity, noise interference, and geometric complexity, which collectively hinder segmentation accuracy. Targeting these challenges, we proposes the Multi-Aspect Large Vision Model (MasLVM), a foundational model for optical medical imaging that achieves comprehensive feature fusion across Tri-Path fusion. The Semantic Context Encoder (SCE) integrates a pre-trained large vision model with global semantic embeddings to improve contextual abstraction and mitigate semantic ambiguities. The Spectral Spline Encoder (SSE), incorporating the Multi-Frequency Feature Modulator (MFFM) and Kolmogorov–Arnold Networks (KAN) Channel Attention, transitions image representations into the frequency domain to selectively attenuate noise while preserving essential structural features. The Hierarchical Deformable Morphometry Encoder (HDME) employs deformable convolutions and multi-scale encoding to capture heterogeneous geometric structures dynamically. The outputs from these branches are synthesized through the Multi-Attention KAN Decoder, which employs KAN multiple self-attention and iterative attentional fusion to select and enhance semantic, spectral, and morphological critical domain features adaptively. Extensive experiments across six widely recognized datasets demonstrate that MasLVM achieves convincing performance compared with multiple previous state-of-the-art (SoTA) methods, and potential utility in adapting to diverse requirements of visible-light medical imaging tasks under constrained conditions. The code and model weights can be directly used for medical task deployment or fine-tuning, and are publicly available at the following link: https://github.com/IMOP-lab/MasLVM-Pytorch.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信