MedScale-Former: Self-guided multiscale transformer for medical image segmentation

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-04-04 DOI:10.1016/j.media.2025.103554

Sanaz Karimijafarbigloo , Reza Azad , Amirhossein Kazerouni , Dorit Merhof

{"title":"MedScale-Former: Self-guided multiscale transformer for medical image segmentation","authors":"Sanaz Karimijafarbigloo , Reza Azad , Amirhossein Kazerouni , Dorit Merhof","doi":"10.1016/j.media.2025.103554","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate medical image segmentation is crucial for enabling automated clinical decision procedures. However, existing supervised deep learning methods for medical image segmentation face significant challenges due to their reliance on extensive labeled training data. To address this limitation, our novel approach introduces a dual-branch transformer network operating on two scales, strategically encoding global contextual dependencies while preserving local information. To promote self-supervised learning, our method leverages semantic dependencies between different scales, generating a supervisory signal for inter-scale consistency. Additionally, it incorporates a spatial stability loss within each scale, fostering self-supervised content clustering. While intra-scale and inter-scale consistency losses enhance feature uniformity within clusters, we introduce a cross-entropy loss function atop the clustering score map to effectively model cluster distributions and refine decision boundaries. Furthermore, to account for pixel-level similarities between organ or lesion subpixels, we propose a selective kernel regional attention module as a plug and play component. This module adeptly captures and outlines organ or lesion regions, slightly enhancing the definition of object boundaries. Our experimental results on skin lesion, lung organ, and multiple myeloma plasma cell segmentation tasks demonstrate the superior performance of our method compared to state-of-the-art approaches.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103554"},"PeriodicalIF":10.7000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136184152500101X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate medical image segmentation is crucial for enabling automated clinical decision procedures. However, existing supervised deep learning methods for medical image segmentation face significant challenges due to their reliance on extensive labeled training data. To address this limitation, our novel approach introduces a dual-branch transformer network operating on two scales, strategically encoding global contextual dependencies while preserving local information. To promote self-supervised learning, our method leverages semantic dependencies between different scales, generating a supervisory signal for inter-scale consistency. Additionally, it incorporates a spatial stability loss within each scale, fostering self-supervised content clustering. While intra-scale and inter-scale consistency losses enhance feature uniformity within clusters, we introduce a cross-entropy loss function atop the clustering score map to effectively model cluster distributions and refine decision boundaries. Furthermore, to account for pixel-level similarities between organ or lesion subpixels, we propose a selective kernel regional attention module as a plug and play component. This module adeptly captures and outlines organ or lesion regions, slightly enhancing the definition of object boundaries. Our experimental results on skin lesion, lung organ, and multiple myeloma plasma cell segmentation tasks demonstrate the superior performance of our method compared to state-of-the-art approaches.

查看原文本刊更多论文

MedScale-Former：用于医学图像分割的自导向多尺度变压器

准确的医学图像分割对于实现自动化临床决策程序至关重要。然而，现有的医学图像分割的监督深度学习方法由于依赖于大量的标记训练数据而面临着巨大的挑战。为了解决这一限制，我们的新方法引入了一个在两个尺度上运行的双支路变压器网络，在保留本地信息的同时战略性地编码全局上下文依赖关系。为了促进自监督学习，我们的方法利用不同尺度之间的语义依赖，产生一个监督信号来保证尺度间的一致性。此外，它还包含了每个尺度内的空间稳定性损失，促进了自我监督的内容聚类。虽然尺度内和尺度间的一致性损失增强了聚类内部特征的均匀性，但我们在聚类得分图上引入了交叉熵损失函数，以有效地模拟聚类分布并细化决策边界。此外，为了考虑器官或病变亚像素之间的像素级相似性，我们提出了一个选择性核区域注意模块作为即插即用组件。该模块熟练地捕获和勾勒器官或病变区域，略微增强了物体边界的定义。我们在皮肤损伤、肺器官和多发性骨髓瘤浆细胞分割任务上的实验结果表明，与目前最先进的方法相比，我们的方法具有优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.