AttenScribble: Attention-enhanced scribble supervision for medical image segmentation

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-05-22 DOI:10.1016/j.jvcir.2025.104476

Mu Tian , Qinzhu Yang , Yi Gao

{"title":"AttenScribble: Attention-enhanced scribble supervision for medical image segmentation","authors":"Mu Tian , Qinzhu Yang , Yi Gao","doi":"10.1016/j.jvcir.2025.104476","DOIUrl":null,"url":null,"abstract":"<div><div>The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to their flexibility. However, due to the lack of shape and boundary information, it is extremely challenging to train a deep network on scribbles that generalize on unlabeled pixels. In this paper, we present a straightforward yet effective scribble-supervised learning framework. Inspired by recent advances in transformer-based segmentation, we create a pluggable spatial self-attention module that could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. The module infuses global interaction while keeping the efficiency of convolutions. Descended from this module, we construct a similarity metric based on normalized and symmetrized attention. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity. This attentive similarity loss optimizes the alignment of FCN encoders, attention mapping and model prediction. Ultimately, the proposed FCN+Attention architecture can be trained end-to-end guided by a combination of three learning objectives: partial segmentation loss, customized masked conditional random fields, and the proposed attentive similarity loss. Extensive experiments on public datasets (ACDC and CHAOS) showed that our framework not only outperforms existing state-of-the-art but also delivers close performance to fully-supervised benchmarks. The code is available at <span><span>https://github.com/YangQinzhu/AttenScribble.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104476"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000902","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to their flexibility. However, due to the lack of shape and boundary information, it is extremely challenging to train a deep network on scribbles that generalize on unlabeled pixels. In this paper, we present a straightforward yet effective scribble-supervised learning framework. Inspired by recent advances in transformer-based segmentation, we create a pluggable spatial self-attention module that could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. The module infuses global interaction while keeping the efficiency of convolutions. Descended from this module, we construct a similarity metric based on normalized and symmetrized attention. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity. This attentive similarity loss optimizes the alignment of FCN encoders, attention mapping and model prediction. Ultimately, the proposed FCN+Attention architecture can be trained end-to-end guided by a combination of three learning objectives: partial segmentation loss, customized masked conditional random fields, and the proposed attentive similarity loss. Extensive experiments on public datasets (ACDC and CHAOS) showed that our framework not only outperforms existing state-of-the-art but also delivers close performance to fully-supervised benchmarks. The code is available at https://github.com/YangQinzhu/AttenScribble.git.

查看原文本刊更多论文

AttenScribble：用于医学图像分割的注意力增强涂鸦监督

深度网络在医学图像分割中的成功很大程度上依赖于大量的标记训练数据。然而，获取密集注释是一个耗时的过程。弱监督方法通常采用成本较低的监督形式，其中涂鸦由于其灵活性最近开始流行。然而，由于缺乏形状和边界信息，在无标记像素的涂鸦上训练一个深度网络是极具挑战性的。在本文中，我们提出了一个简单而有效的涂鸦监督学习框架。受基于变压器的分割的最新进展的启发，我们创建了一个可插拔的空间自关注模块，该模块可以附加在任意全卷积网络（FCN）主干的任何内部特征层之上。该模块注入了全局交互，同时保持了卷积的效率。在此基础上，我们构建了一个基于规范化和对称化注意力的相似性度量。这种关注的相似性导致了一种新的正则化损失，在分割预测和视觉亲和力之间施加一致性。这种注意相似度损失优化了FCN编码器的对齐、注意映射和模型预测。最终，提出的FCN+注意力架构可以通过三个学习目标的组合进行端到端训练：部分分割损失、自定义屏蔽条件随机场和提出的注意相似度损失。在公共数据集（ACDC和CHAOS）上进行的大量实验表明，我们的框架不仅优于现有的最先进的技术，而且还提供了接近完全监督基准的性能。代码可在https://github.com/YangQinzhu/AttenScribble.git上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.