SASVi: segment any surgical video.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-05-20 DOI:10.1007/s11548-025-03408-y

Ssharvien Kumar Sivakumar, Yannik Frisch, Amin Ranem, Anirban Mukhopadhyay

{"title":"SASVi: segment any surgical video.","authors":"Ssharvien Kumar Sivakumar, Yannik Frisch, Amin Ranem, Anirban Mukhopadhyay","doi":"10.1007/s11548-025-03408-y","DOIUrl":null,"url":null,"abstract":"Purpose: Foundation models, trained on multitudes of public datasets, often require additional fine-tuning or re-prompting mechanisms to be applied to visually distinct target domains such as surgical videos. Further, without domain knowledge, they cannot model the specific semantics of the target domain. Hence, when applied to surgical video segmentation, they fail to generalise to sections where previously tracked objects leave the scene or new objects enter.Methods: We propose SASVi, a novel re-prompting mechanism based on a frame-wise object detection Overseer model, which is trained on a minimal amount of scarcely available annotations for the target domain. This model automatically re-prompts the foundation model SAM2 when the scene constellation changes, allowing for temporally smooth and complete segmentation of full surgical videos.Results: Re-prompting based on our Overseer model significantly improves the temporal consistency of surgical video segmentation compared to similar prompting techniques and especially frame-wise segmentation, which neglects temporal information, by at least 2.4%. Our proposed approach allows us to successfully deploy SAM2 to surgical videos, which we quantitatively and qualitatively demonstrate for three different cholecystectomy and cataract surgery datasets.Conclusion: SASVi can serve as a new baseline for smooth and temporally consistent segmentation of surgical videos with scarcely available annotation data. Our method allows us to leverage scarce annotations and obtain complete annotations for full videos of the large-scale counterpart datasets. We make those annotations publicly available, providing extensive annotation data for the future development of surgical data science models.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03408-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Foundation models, trained on multitudes of public datasets, often require additional fine-tuning or re-prompting mechanisms to be applied to visually distinct target domains such as surgical videos. Further, without domain knowledge, they cannot model the specific semantics of the target domain. Hence, when applied to surgical video segmentation, they fail to generalise to sections where previously tracked objects leave the scene or new objects enter.

Methods: We propose SASVi, a novel re-prompting mechanism based on a frame-wise object detection Overseer model, which is trained on a minimal amount of scarcely available annotations for the target domain. This model automatically re-prompts the foundation model SAM2 when the scene constellation changes, allowing for temporally smooth and complete segmentation of full surgical videos.

Results: Re-prompting based on our Overseer model significantly improves the temporal consistency of surgical video segmentation compared to similar prompting techniques and especially frame-wise segmentation, which neglects temporal information, by at least 2.4%. Our proposed approach allows us to successfully deploy SAM2 to surgical videos, which we quantitatively and qualitatively demonstrate for three different cholecystectomy and cataract surgery datasets.

Conclusion: SASVi can serve as a new baseline for smooth and temporally consistent segmentation of surgical videos with scarcely available annotation data. Our method allows us to leverage scarce annotations and obtain complete annotations for full videos of the large-scale counterpart datasets. We make those annotations publicly available, providing extensive annotation data for the future development of surgical data science models.

查看原文本刊更多论文

SASVi：分割任何手术视频。

目的：在大量公共数据集上训练的基础模型，通常需要额外的微调或重新提示机制，以应用于视觉上不同的目标域，如手术视频。此外，如果没有领域知识，它们就不能对目标领域的特定语义进行建模。因此，当应用于外科手术视频分割时，它们无法推广到先前跟踪的对象离开场景或新对象进入的部分。方法：我们提出了SASVi，这是一种基于逐帧目标检测监督模型的新型重新提示机制，该模型在目标域的最少量几乎不可用的注释上进行训练。当场景星座发生变化时，该模型会自动重新提示基础模型SAM2，从而对整个手术视频进行暂时平滑和完整的分割。结果：与类似的提示技术相比，基于我们的Overseer模型的重新提示显著提高了手术视频分割的时间一致性，尤其是忽略时间信息的帧分割，至少提高了2.4%。我们提出的方法使我们能够成功地将SAM2部署到手术视频中，我们定量和定性地演示了三种不同的胆囊切除术和白内障手术数据集。结论：SASVi可以作为新的基线，在几乎没有注释数据的情况下，对手术视频进行平滑和时间一致的分割。我们的方法允许我们利用稀缺的注释，并获得大规模对应数据集的完整视频的完整注释。我们将这些注释公开提供，为外科数据科学模型的未来发展提供广泛的注释数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.