Feature distance-weighted adaptive decoupled knowledge distillation for medical image segmentation.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-10-01 Epub Date: 2025-04-22 DOI:10.1007/s11548-025-03346-9

Xiangchun Yu, Ziyun Xiong, Miaomiao Liang, Lingjuan Yu, Jian Zheng

{"title":"Feature distance-weighted adaptive decoupled knowledge distillation for medical image segmentation.","authors":"Xiangchun Yu, Ziyun Xiong, Miaomiao Liang, Lingjuan Yu, Jian Zheng","doi":"10.1007/s11548-025-03346-9","DOIUrl":null,"url":null,"abstract":"Purpose: This paper aims to apply decoupled knowledge distillation (DKD) to medical image segmentation, focusing on transferring knowledge from a high-performance teacher network to a lightweight student network, thereby facilitating model deployment on embedded devices.Methods: We initially decouple the distillation loss into pixel-wise target class knowledge distillation (PTCKD) and pixel-wise non-target class knowledge distillation (PNCKD). Subsequently, to address the limitations of the fixed weight paradigm in PTCKD, we propose a novel feature distance-weighted adaptive decoupled knowledge distillation (FDWA-DKD) method. FDWA-DKD quantifies the feature disparity between student and teacher, generating instance-level adaptive weights for PTCKD. We design a feature distance weighting (FDW) module that dynamically calculates feature distance to obtain adaptive weights, integrating feature space distance information into logit distillation. Lastly, we introduce a class-wise feature probability distribution loss to encourage the student to mimic the teacher's spatial distribution.Results: Extensive experiments conducted on the Synapse and FLARE22 datasets demonstrate that our proposed FDWA-DKD achieves satisfactory performance, yielding optimal Dice scores and, in some instances, surpassing the performance of the teacher network. Ablation studies further validate the effectiveness of each module within our proposed method.Conclusion: Our method overcomes the constraints of traditional distillation methods by offering instance-level adaptive learning weights tailored to PTCKD. By quantifying student-teacher feature disparity and minimizing class-wise feature probability distribution loss, our method outperforms other distillation methods.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2153-2165"},"PeriodicalIF":2.3000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03346-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/22 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This paper aims to apply decoupled knowledge distillation (DKD) to medical image segmentation, focusing on transferring knowledge from a high-performance teacher network to a lightweight student network, thereby facilitating model deployment on embedded devices.

Methods: We initially decouple the distillation loss into pixel-wise target class knowledge distillation (PTCKD) and pixel-wise non-target class knowledge distillation (PNCKD). Subsequently, to address the limitations of the fixed weight paradigm in PTCKD, we propose a novel feature distance-weighted adaptive decoupled knowledge distillation (FDWA-DKD) method. FDWA-DKD quantifies the feature disparity between student and teacher, generating instance-level adaptive weights for PTCKD. We design a feature distance weighting (FDW) module that dynamically calculates feature distance to obtain adaptive weights, integrating feature space distance information into logit distillation. Lastly, we introduce a class-wise feature probability distribution loss to encourage the student to mimic the teacher's spatial distribution.

Results: Extensive experiments conducted on the Synapse and FLARE22 datasets demonstrate that our proposed FDWA-DKD achieves satisfactory performance, yielding optimal Dice scores and, in some instances, surpassing the performance of the teacher network. Ablation studies further validate the effectiveness of each module within our proposed method.

Conclusion: Our method overcomes the constraints of traditional distillation methods by offering instance-level adaptive learning weights tailored to PTCKD. By quantifying student-teacher feature disparity and minimizing class-wise feature probability distribution loss, our method outperforms other distillation methods.

查看原文本刊更多论文

特征距离加权自适应解耦知识蒸馏在医学图像分割中的应用。

目的：本文旨在将解耦知识蒸馏（DKD）应用于医学图像分割，重点是将知识从高性能的教师网络转移到轻量级的学生网络，从而促进模型在嵌入式设备上的部署。方法：首先将蒸馏损失解耦为逐像素目标类知识蒸馏（PTCKD）和逐像素非目标类知识蒸馏（PNCKD）。随后，为了解决固定权重范式在PTCKD中的局限性，我们提出了一种新的特征距离加权自适应解耦知识蒸馏（FDWA-DKD）方法。FDWA-DKD量化学生和教师之间的特征差异，为PTCKD生成实例级自适应权重。设计了特征距离加权（FDW）模块，动态计算特征距离获得自适应权重，将特征空间距离信息整合到logit精馏中。最后，我们引入了一个分类特征概率分布损失来鼓励学生模仿老师的空间分布。结果：在Synapse和FLARE22数据集上进行的大量实验表明，我们提出的FDWA-DKD达到了令人满意的性能，产生了最佳的Dice分数，在某些情况下，甚至超过了教师网络的性能。烧蚀研究进一步验证了我们提出的方法中每个模块的有效性。结论：该方法通过提供适合PTCKD的实例级自适应学习权值，克服了传统蒸馏方法的局限性。通过量化学生-教师特征差异和最小化班级特征概率分布损失，我们的方法优于其他蒸馏方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.