PLDKD-Net：基于图的视觉分析的手术场景分割的像素级判别知识蒸馏

IF 5.9 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-09-04 DOI:10.1109/TIM.2025.3606028

Bo Lu;Xiangxing Zheng;Zhenjie Zhu;Yuhao Guo;Ziyi Wang;Bruce X. B. Yu;Mingchuan Zhou;Peng Qi;Huicong Liu;Yunhui Liu;Lining Sun

{"title":"PLDKD-Net：基于图的视觉分析的手术场景分割的像素级判别知识蒸馏","authors":"Bo Lu;Xiangxing Zheng;Zhenjie Zhu;Yuhao Guo;Ziyi Wang;Bruce X. B. Yu;Mingchuan Zhou;Peng Qi;Huicong Liu;Yunhui Liu;Lining Sun","doi":"10.1109/TIM.2025.3606028","DOIUrl":null,"url":null,"abstract":"Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose a pixel-level discriminative knowledge distillation network (PLDKD-Net), a novel pixel-level student–teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multiscale and interspatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multilatent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art (SOTA) methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-14"},"PeriodicalIF":5.9000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PLDKD-Net: Pixel-Level Discriminative Knowledge Distillation for Surgical Scene Segmentation With Graph-Based Visual Parsing\",\"authors\":\"Bo Lu;Xiangxing Zheng;Zhenjie Zhu;Yuhao Guo;Ziyi Wang;Bruce X. B. Yu;Mingchuan Zhou;Peng Qi;Huicong Liu;Yunhui Liu;Lining Sun\",\"doi\":\"10.1109/TIM.2025.3606028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose a pixel-level discriminative knowledge distillation network (PLDKD-Net), a novel pixel-level student–teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multiscale and interspatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multilatent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art (SOTA) methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-14\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11151592/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11151592/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

高效的腹腔镜场景分割在手术辅助智能和机器人手术中图像引导任务自主性方面具有重要的潜力。然而，腹腔复杂的组织和手术工具在不同的条件下对分割的准确性和效率之间的平衡提出了挑战。为了解决这一问题，我们提出了一种新的像素级学生-教师知识蒸馏（KD）框架——像素级判别知识蒸馏网络（PLDKD-Net），在该框架中，学生模型选择性地提取教师的渊博知识，同时利用基于图的融合机制探索丰富的视觉特征，实现高效分割。具体来说，我们首先介绍了基于置信度的KD （Confi-KD）方案，其中提出了一个像素级置信度生成器（PCG），通过判别性地评估其概率图和原始图像来评估教师的表现，生成一个置信度图，可以促进学生模型的选择性KD。为了平衡模型的准确性和效率，我们设计了一种具有双流视觉解析管道的新型异构学生架构来捕获多尺度和空间间的视觉特征。然后使用关系图卷积网络（RGCN）融合这些特征，该网络可以自适应调整多潜知识的融合程度，在保证视觉解析完整性的同时避免计算冗余。我们在Endovis18和CholecSeg8K两个公共腹腔镜基准以及内部手术视频上广泛验证了PLDKD-Net。得益于我们的方案，与最先进的（SOTA）方法相比，实验结果显示出优越的定量和定性性能。通过选择性KD机制，我们的模型比繁琐的教师模型具有竞争力甚至更高的性能，同时表现出准实时的效率，这表明它在智能机器人手术场景理解方面具有更大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PLDKD-Net: Pixel-Level Discriminative Knowledge Distillation for Surgical Scene Segmentation With Graph-Based Visual Parsing

Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose a pixel-level discriminative knowledge distillation network (PLDKD-Net), a novel pixel-level student–teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multiscale and interspatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multilatent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art (SOTA) methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.