Bo Lu;Xiangxing Zheng;Zhenjie Zhu;Yuhao Guo;Ziyi Wang;Bruce X. B. Yu;Mingchuan Zhou;Peng Qi;Huicong Liu;Yunhui Liu;Lining Sun
{"title":"PLDKD-Net:基于图的视觉分析的手术场景分割的像素级判别知识蒸馏","authors":"Bo Lu;Xiangxing Zheng;Zhenjie Zhu;Yuhao Guo;Ziyi Wang;Bruce X. B. Yu;Mingchuan Zhou;Peng Qi;Huicong Liu;Yunhui Liu;Lining Sun","doi":"10.1109/TIM.2025.3606028","DOIUrl":null,"url":null,"abstract":"Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose a pixel-level discriminative knowledge distillation network (PLDKD-Net), a novel pixel-level student–teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multiscale and interspatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multilatent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art (SOTA) methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-14"},"PeriodicalIF":5.9000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PLDKD-Net: Pixel-Level Discriminative Knowledge Distillation for Surgical Scene Segmentation With Graph-Based Visual Parsing\",\"authors\":\"Bo Lu;Xiangxing Zheng;Zhenjie Zhu;Yuhao Guo;Ziyi Wang;Bruce X. B. Yu;Mingchuan Zhou;Peng Qi;Huicong Liu;Yunhui Liu;Lining Sun\",\"doi\":\"10.1109/TIM.2025.3606028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose a pixel-level discriminative knowledge distillation network (PLDKD-Net), a novel pixel-level student–teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multiscale and interspatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multilatent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art (SOTA) methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-14\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11151592/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11151592/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
PLDKD-Net: Pixel-Level Discriminative Knowledge Distillation for Surgical Scene Segmentation With Graph-Based Visual Parsing
Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose a pixel-level discriminative knowledge distillation network (PLDKD-Net), a novel pixel-level student–teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multiscale and interspatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multilatent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art (SOTA) methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.