Franciszek M Nowak, Evangelos B Mazomenos, Brian Davidson, Matthew J Clarkson
{"title":"SwinCVS: a unified approach to classifying critical view of safety structures in laparoscopic cholecystectomy.","authors":"Franciszek M Nowak, Evangelos B Mazomenos, Brian Davidson, Matthew J Clarkson","doi":"10.1007/s11548-025-03354-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Laparoscopic cholecystectomy is one of the most commonly performed surgeries in the UK. Despite its safety, the volume of operations leads to a notable number of complications, with surgical errors often mitigated by the critical view of safety (CVS) technique. However, reliably achieving CVS intraoperatively can be challenging. Current state-of-the-art models for automated CVS evaluation rely on complex, multistage training and semantic segmentation masks, restricting their adaptability and limiting further performance improvements.</p><p><strong>Methods: </strong>We propose SwinCVS, a spatiotemporal architecture designed for end-to-end training. SwinCVS combines the SwinV2 image encoder with an LSTM for robust CVS classification. We evaluated three different backbones-SwinV2, VMamba, and ResNet50-to assess their ability to encode surgical images. SwinCVS model was evaluated with the end-to-end variant, and with the pretrained variant with performance statistically compared with the current state-of-the-art, SV2LSTG on Endoscapes dataset.</p><p><strong>Results: </strong>SwinV2 demonstrated as the best encoder achieving +2.07% and +17.72% mAP over VMamba and ResNet50, respectively. SwinCVS trained end-to-end achieves 64.59% mAP and performs on par with SV2LSTG (64.68% mAP, p=0.470), while its pretrained variant achieves 67.45% mAP showing a significant improvement over the current SOTA.</p><p><strong>Conclusion: </strong>Our proposed solution offers a promising approach for CVS classification, outperforming existing methods and eliminating the need for semantic segmentation masks. Its design supports robust feature extraction and allows for future enhancements through additional tasks that force clinically relevant priors. The results highlight that attention-based architectures like SwinV2 are well suited for surgical image encoding, offering a practical approach for improving automated systems in laparoscopic surgery.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03354-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Laparoscopic cholecystectomy is one of the most commonly performed surgeries in the UK. Despite its safety, the volume of operations leads to a notable number of complications, with surgical errors often mitigated by the critical view of safety (CVS) technique. However, reliably achieving CVS intraoperatively can be challenging. Current state-of-the-art models for automated CVS evaluation rely on complex, multistage training and semantic segmentation masks, restricting their adaptability and limiting further performance improvements.
Methods: We propose SwinCVS, a spatiotemporal architecture designed for end-to-end training. SwinCVS combines the SwinV2 image encoder with an LSTM for robust CVS classification. We evaluated three different backbones-SwinV2, VMamba, and ResNet50-to assess their ability to encode surgical images. SwinCVS model was evaluated with the end-to-end variant, and with the pretrained variant with performance statistically compared with the current state-of-the-art, SV2LSTG on Endoscapes dataset.
Results: SwinV2 demonstrated as the best encoder achieving +2.07% and +17.72% mAP over VMamba and ResNet50, respectively. SwinCVS trained end-to-end achieves 64.59% mAP and performs on par with SV2LSTG (64.68% mAP, p=0.470), while its pretrained variant achieves 67.45% mAP showing a significant improvement over the current SOTA.
Conclusion: Our proposed solution offers a promising approach for CVS classification, outperforming existing methods and eliminating the need for semantic segmentation masks. Its design supports robust feature extraction and allows for future enhancements through additional tasks that force clinically relevant priors. The results highlight that attention-based architectures like SwinV2 are well suited for surgical image encoding, offering a practical approach for improving automated systems in laparoscopic surgery.
期刊介绍:
The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.