Augmenting efficient real-time surgical instrument segmentation in video with point tracking and Segment Anything

IF 2.8 Q3 ENGINEERING, BIOMEDICAL
Zijian Wu, Adam Schmidt, Peter Kazanzides, Septimiu E. Salcudean
{"title":"Augmenting efficient real-time surgical instrument segmentation in video with point tracking and Segment Anything","authors":"Zijian Wu,&nbsp;Adam Schmidt,&nbsp;Peter Kazanzides,&nbsp;Septimiu E. Salcudean","doi":"10.1049/htl2.12111","DOIUrl":null,"url":null,"abstract":"<p>The Segment Anything model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost limit its usage in robotically assisted surgery. Applications, such as augmented reality guidance, require little user intervention along with efficient inference to be usable clinically. This study addresses these limitations by adopting lightweight SAM variants to meet the efficiency requirement and employing fine-tuning techniques to enhance their generalization in surgical scenes. Recent advancements in tracking any point have shown promising results in both accuracy and efficiency, particularly when points are occluded or leave the field of view. Inspired by this progress, a novel framework is presented that combines an online point tracker with a lightweight SAM model that is fine-tuned for surgical instrument segmentation. Sparse points within the region of interest are tracked and used to prompt SAM throughout the video sequence, providing temporal consistency. The quantitative results surpass the state-of-the-art semi-supervised video object segmentation method XMem on the EndoVis 2015 dataset with 84.8 IoU and 91.0 Dice. The method achieves promising performance that is comparable to XMem and transformer-based fully supervised segmentation methods on ex vivo UCL dVRK and in vivo CholecSeg8k datasets. In addition, the proposed method shows promising zero-shot generalization ability on the label-free STIR dataset. In terms of efficiency, the method was tested on a single GeForce RTX 4060/4090 GPU respectively, achieving an over 25/90 FPS inference speed. Code is available at: https://github.com/zijianwu1231/SIS-PT-SAM.</p>","PeriodicalId":37474,"journal":{"name":"Healthcare Technology Letters","volume":"12 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730982/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/htl2.12111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The Segment Anything model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost limit its usage in robotically assisted surgery. Applications, such as augmented reality guidance, require little user intervention along with efficient inference to be usable clinically. This study addresses these limitations by adopting lightweight SAM variants to meet the efficiency requirement and employing fine-tuning techniques to enhance their generalization in surgical scenes. Recent advancements in tracking any point have shown promising results in both accuracy and efficiency, particularly when points are occluded or leave the field of view. Inspired by this progress, a novel framework is presented that combines an online point tracker with a lightweight SAM model that is fine-tuned for surgical instrument segmentation. Sparse points within the region of interest are tracked and used to prompt SAM throughout the video sequence, providing temporal consistency. The quantitative results surpass the state-of-the-art semi-supervised video object segmentation method XMem on the EndoVis 2015 dataset with 84.8 IoU and 91.0 Dice. The method achieves promising performance that is comparable to XMem and transformer-based fully supervised segmentation methods on ex vivo UCL dVRK and in vivo CholecSeg8k datasets. In addition, the proposed method shows promising zero-shot generalization ability on the label-free STIR dataset. In terms of efficiency, the method was tested on a single GeForce RTX 4060/4090 GPU respectively, achieving an over 25/90 FPS inference speed. Code is available at: https://github.com/zijianwu1231/SIS-PT-SAM.

Abstract Image

利用点跟踪和任意分割技术增强手术器械的视频实时分割。
SAM是一个强大的视觉基础模型,它彻底改变了传统的分割模式。尽管如此,依赖于提示每帧和庞大的计算成本限制了它在机器人辅助手术中的应用。应用程序,如增强现实指导,需要很少的用户干预以及有效的推理,可用于临床。本研究通过采用轻量级SAM变体来满足效率要求,并采用微调技术来增强其在手术场景中的泛化,从而解决了这些限制。跟踪任何点的最新进展在准确性和效率方面都显示出有希望的结果,特别是当点被遮挡或离开视野时。受这一进展的启发,提出了一种将在线点跟踪器与轻量级SAM模型相结合的新框架,该模型对手术器械分割进行了微调。跟踪感兴趣区域内的稀疏点,并用于在整个视频序列中提示SAM,从而提供时间一致性。定量结果以84.8 IoU和91.0 Dice超过了EndoVis 2015数据集上最先进的半监督视频对象分割方法XMem。该方法在离体UCL dVRK和体内CholecSeg8k数据集上取得了与XMem和基于变压器的完全监督分割方法相当的良好性能。此外,该方法在无标记的STIR数据集上显示出良好的零次泛化能力。在效率方面,该方法分别在单个GeForce RTX 4060/4090 GPU上进行了测试,实现了超过25/90 FPS的推理速度。代码可从https://github.com/zijianwu1231/SIS-PT-SAM获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Healthcare Technology Letters
Healthcare Technology Letters Health Professions-Health Information Management
CiteScore
6.10
自引率
4.80%
发文量
12
审稿时长
22 weeks
期刊介绍: Healthcare Technology Letters aims to bring together an audience of biomedical and electrical engineers, physical and computer scientists, and mathematicians to enable the exchange of the latest ideas and advances through rapid online publication of original healthcare technology research. Major themes of the journal include (but are not limited to): Major technological/methodological areas: Biomedical signal processing Biomedical imaging and image processing Bioinstrumentation (sensors, wearable technologies, etc) Biomedical informatics Major application areas: Cardiovascular and respiratory systems engineering Neural engineering, neuromuscular systems Rehabilitation engineering Bio-robotics, surgical planning and biomechanics Therapeutic and diagnostic systems, devices and technologies Clinical engineering Healthcare information systems, telemedicine, mHealth.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信