MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.

IF 3.4 3区医学 Q2 ENGINEERING, BIOMEDICAL

Physics in medicine and biology Pub Date : 2025-09-25 DOI:10.1088/1361-6560/ae07a1

Zunhui Xia, Hongxing Li, Libin Lan

{"title":"MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.","authors":"Zunhui Xia, Hongxing Li, Libin Lan","doi":"10.1088/1361-6560/ae07a1","DOIUrl":null,"url":null,"abstract":"Objective. Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas.Approach. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency.Main results. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above.Significance. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application. The code is available onGitHub.","PeriodicalId":20185,"journal":{"name":"Physics in medicine and biology","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics in medicine and biology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1088/1361-6560/ae07a1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective. Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas.Approach. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency.Main results. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above.Significance. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application. The code is available onGitHub.

查看原文本刊更多论文

MedFormer：具有内容感知的双稀疏选择关注的分层医疗视觉转换器。

目的：医学图像识别是辅助临床诊断的重要手段，能够更准确、及时地识别疾病和异常。基于视觉转换器的方法已被证明在处理各种医疗识别任务方面是有效的。然而，这些方法遇到了两个主要挑战。首先，它们通常是特定于任务和体系结构的，限制了它们的一般适用性。其次，它们通常要么完全关注模型的远程依赖关系，从而导致较高的计算成本，要么依赖于手工制作的稀疏关注，从而可能导致次优性能。为了解决这些问题，我们提出了MedFormer，一个高效的医疗视觉转换器，有两个关键的想法。方法：首先，它采用金字塔尺度结构作为各种医学图像识别任务的通用主干，包括图像分类和语义分割、病变检测等密集预测任务。这种结构有利于特征的分层表示，同时减少了特征映射的计算量，有利于提高性能。其次，引入了一种新的具有内容感知的双稀疏选择注意（DSSA），在保持高性能的同时提高了计算效率和对噪声的鲁棒性。DSSA是MedFormer的核心构建技术，旨在明确关注最相关的内容。理论分析表明，MedFormer在通用性和效率方面优于现有的医疗视觉变压器。主要结果：在各种成像模式数据集上进行的大量实验表明，MedFormer在上述所有三种医学图像识别任务中始终如一地提高了性能。意义：MedFormer为医学图像识别提供了一种高效、通用的解决方案，具有很强的临床应用潜力。代码可在GitHub上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physics in medicine and biology 医学-工程：生物医学

CiteScore

6.50

自引率

14.30%

发文量

409

审稿时长

2 months

期刊介绍： The development and application of theoretical, computational and experimental physics to medicine, physiology and biology. Topics covered are: therapy physics (including ionizing and non-ionizing radiation); biomedical imaging (e.g. x-ray, magnetic resonance, ultrasound, optical and nuclear imaging); image-guided interventions; image reconstruction and analysis (including kinetic modelling); artificial intelligence in biomedical physics and analysis; nanoparticles in imaging and therapy; radiobiology; radiation protection and patient dose monitoring; radiation dosimetry