SwinCVS: a unified approach to classifying critical view of safety structures in laparoscopic cholecystectomy.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-06-01 Epub Date: 2025-04-11 DOI:10.1007/s11548-025-03354-9

Franciszek M Nowak, Evangelos B Mazomenos, Brian Davidson, Matthew J Clarkson

{"title":"SwinCVS: a unified approach to classifying critical view of safety structures in laparoscopic cholecystectomy.","authors":"Franciszek M Nowak, Evangelos B Mazomenos, Brian Davidson, Matthew J Clarkson","doi":"10.1007/s11548-025-03354-9","DOIUrl":null,"url":null,"abstract":"Purpose: Laparoscopic cholecystectomy is one of the most commonly performed surgeries in the UK. Despite its safety, the volume of operations leads to a notable number of complications, with surgical errors often mitigated by the critical view of safety (CVS) technique. However, reliably achieving CVS intraoperatively can be challenging. Current state-of-the-art models for automated CVS evaluation rely on complex, multistage training and semantic segmentation masks, restricting their adaptability and limiting further performance improvements.Methods: We propose SwinCVS, a spatiotemporal architecture designed for end-to-end training. SwinCVS combines the SwinV2 image encoder with an LSTM for robust CVS classification. We evaluated three different backbones-SwinV2, VMamba, and ResNet50-to assess their ability to encode surgical images. SwinCVS model was evaluated with the end-to-end variant, and with the pretrained variant with performance statistically compared with the current state-of-the-art, SV2LSTG on Endoscapes dataset.Results: SwinV2 demonstrated as the best encoder achieving +2.07% and +17.72% mAP over VMamba and ResNet50, respectively. SwinCVS trained end-to-end achieves 64.59% mAP and performs on par with SV2LSTG (64.68% mAP, p=0.470), while its pretrained variant achieves 67.45% mAP showing a significant improvement over the current SOTA.Conclusion: Our proposed solution offers a promising approach for CVS classification, outperforming existing methods and eliminating the need for semantic segmentation masks. Its design supports robust feature extraction and allows for future enhancements through additional tasks that force clinically relevant priors. The results highlight that attention-based architectures like SwinV2 are well suited for surgical image encoding, offering a practical approach for improving automated systems in laparoscopic surgery.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1145-1152"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12167293/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03354-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/11 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Laparoscopic cholecystectomy is one of the most commonly performed surgeries in the UK. Despite its safety, the volume of operations leads to a notable number of complications, with surgical errors often mitigated by the critical view of safety (CVS) technique. However, reliably achieving CVS intraoperatively can be challenging. Current state-of-the-art models for automated CVS evaluation rely on complex, multistage training and semantic segmentation masks, restricting their adaptability and limiting further performance improvements.

Methods: We propose SwinCVS, a spatiotemporal architecture designed for end-to-end training. SwinCVS combines the SwinV2 image encoder with an LSTM for robust CVS classification. We evaluated three different backbones-SwinV2, VMamba, and ResNet50-to assess their ability to encode surgical images. SwinCVS model was evaluated with the end-to-end variant, and with the pretrained variant with performance statistically compared with the current state-of-the-art, SV2LSTG on Endoscapes dataset.

Results: SwinV2 demonstrated as the best encoder achieving +2.07% and +17.72% mAP over VMamba and ResNet50, respectively. SwinCVS trained end-to-end achieves 64.59% mAP and performs on par with SV2LSTG (64.68% mAP, p=0.470), while its pretrained variant achieves 67.45% mAP showing a significant improvement over the current SOTA.

Conclusion: Our proposed solution offers a promising approach for CVS classification, outperforming existing methods and eliminating the need for semantic segmentation masks. Its design supports robust feature extraction and allows for future enhancements through additional tasks that force clinically relevant priors. The results highlight that attention-based architectures like SwinV2 are well suited for surgical image encoding, offering a practical approach for improving automated systems in laparoscopic surgery.

查看原文本刊更多论文

SwinCVS：一种统一的方法对腹腔镜胆囊切除术中安全结构的关键视图进行分类。

目的：腹腔镜胆囊切除术是英国最常见的手术之一。尽管它是安全的，但大量的手术导致了大量的并发症，手术错误通常通过安全的批判观点（CVS）技术来减轻。然而，在术中可靠地实现CVS是具有挑战性的。目前最先进的自动CVS评估模型依赖于复杂的、多阶段的训练和语义分割掩码，限制了它们的适应性和进一步的性能改进。方法：提出了一种基于端到端训练的时空架构SwinCVS。SwinCVS将SwinV2图像编码器与LSTM相结合，用于稳健的CVS分类。我们评估了三种不同的主干——swinv2、VMamba和resnet50，以评估它们编码手术图像的能力。使用端到端变体对SwinCVS模型进行了评估，并将预训练的变体与当前最先进的endoscape数据集上的SV2LSTG进行了性能统计比较。结果：SwinV2是最佳编码器，与vamba和ResNet50相比，其mAP值分别为+2.07%和+17.72%。SwinCVS训练后的端到端mAP达到64.59%，与SV2LSTG （64.68% mAP, p=0.470）相当，而其预训练变体的mAP达到67.45%，比目前的SOTA有显著提高。结论：我们提出的解决方案为CVS分类提供了一种有前途的方法，优于现有方法，并且消除了对语义分割掩码的需求。它的设计支持强大的特征提取，并允许未来通过额外的任务来增强临床相关的先验。结果强调，像SwinV2这样基于注意力的架构非常适合手术图像编码，为改进腹腔镜手术中的自动化系统提供了一种实用的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.