Bo Chen , Youhao Huang , Yufan Liu , Dong Sui , Fei Yang
{"title":"SRMA-KD:结构化关系多尺度关注知识精馏的有效轻量级心脏图像分割","authors":"Bo Chen , Youhao Huang , Yufan Liu , Dong Sui , Fei Yang","doi":"10.1016/j.imavis.2025.105577","DOIUrl":null,"url":null,"abstract":"<div><div>Cardiac image segmentation is essential for accurately extracting structural information of the heart, aiding in precise diagnosis and personalized treatment planning. However, real-time segmentation on medical devices demands computational efficiency that often conflicts with the intensive processing and storage requirements of deep learning algorithms. These algorithms are frequently hindered by their complex models and extensive parameter sets, which limit their feasibility in clinical settings with constrained resources. Meanwhile, the performance of lightweight heart segmentation models still requires enhancement. This study introduces the SRMA-KD framework, a knowledge distillation approach for cardiac image segmentation designed to achieve high accuracy with lightweight models. The framework efficiently transfers semantic feature information and structural knowledge from a teacher model to student model, ensuring effective segmentation within clinical resource limitations. The SRMA-KD framework includes three key modules: the Global Structural Relational Block (GSRB), the Multi-scale Feature Attention Block (MFAB), and the Prediction Difference Transfer Block (PDTB). The GSRB correlates the outputs of the teacher and student networks with the ground truth, transferring structural correlations to enhance the student network's global feature learning. The MFAB enables the student network to learn multi-scale feature extraction from the teacher network, focusing on relevant semantic regions. The PDTB minimizes pixel-level differences between the segmentation images of the teacher and student networks. Our experiments demonstrate that the SRMA-KD framework significantly improves the segmentation accuracy of the student network compared to other medical imaging knowledge distillation methods, highlighting its potential as an effective solution for cardiac image segmentation in resource-limited clinical environments.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105577"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SRMA-KD: Structured relational multi-scale attention knowledge distillation for effective lightweight cardiac image segmentation\",\"authors\":\"Bo Chen , Youhao Huang , Yufan Liu , Dong Sui , Fei Yang\",\"doi\":\"10.1016/j.imavis.2025.105577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cardiac image segmentation is essential for accurately extracting structural information of the heart, aiding in precise diagnosis and personalized treatment planning. However, real-time segmentation on medical devices demands computational efficiency that often conflicts with the intensive processing and storage requirements of deep learning algorithms. These algorithms are frequently hindered by their complex models and extensive parameter sets, which limit their feasibility in clinical settings with constrained resources. Meanwhile, the performance of lightweight heart segmentation models still requires enhancement. This study introduces the SRMA-KD framework, a knowledge distillation approach for cardiac image segmentation designed to achieve high accuracy with lightweight models. The framework efficiently transfers semantic feature information and structural knowledge from a teacher model to student model, ensuring effective segmentation within clinical resource limitations. The SRMA-KD framework includes three key modules: the Global Structural Relational Block (GSRB), the Multi-scale Feature Attention Block (MFAB), and the Prediction Difference Transfer Block (PDTB). The GSRB correlates the outputs of the teacher and student networks with the ground truth, transferring structural correlations to enhance the student network's global feature learning. The MFAB enables the student network to learn multi-scale feature extraction from the teacher network, focusing on relevant semantic regions. The PDTB minimizes pixel-level differences between the segmentation images of the teacher and student networks. Our experiments demonstrate that the SRMA-KD framework significantly improves the segmentation accuracy of the student network compared to other medical imaging knowledge distillation methods, highlighting its potential as an effective solution for cardiac image segmentation in resource-limited clinical environments.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"159 \",\"pages\":\"Article 105577\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001659\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001659","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Cardiac image segmentation is essential for accurately extracting structural information of the heart, aiding in precise diagnosis and personalized treatment planning. However, real-time segmentation on medical devices demands computational efficiency that often conflicts with the intensive processing and storage requirements of deep learning algorithms. These algorithms are frequently hindered by their complex models and extensive parameter sets, which limit their feasibility in clinical settings with constrained resources. Meanwhile, the performance of lightweight heart segmentation models still requires enhancement. This study introduces the SRMA-KD framework, a knowledge distillation approach for cardiac image segmentation designed to achieve high accuracy with lightweight models. The framework efficiently transfers semantic feature information and structural knowledge from a teacher model to student model, ensuring effective segmentation within clinical resource limitations. The SRMA-KD framework includes three key modules: the Global Structural Relational Block (GSRB), the Multi-scale Feature Attention Block (MFAB), and the Prediction Difference Transfer Block (PDTB). The GSRB correlates the outputs of the teacher and student networks with the ground truth, transferring structural correlations to enhance the student network's global feature learning. The MFAB enables the student network to learn multi-scale feature extraction from the teacher network, focusing on relevant semantic regions. The PDTB minimizes pixel-level differences between the segmentation images of the teacher and student networks. Our experiments demonstrate that the SRMA-KD framework significantly improves the segmentation accuracy of the student network compared to other medical imaging knowledge distillation methods, highlighting its potential as an effective solution for cardiac image segmentation in resource-limited clinical environments.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.