A weakly supervised method for surgical scene components detection with visual foundation model.

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-05-27 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0322751

Xiaoyan Zhang, Jingyi Feng, Qian Zhang, Liming Wu, Yichen Zhu, Ziyu Zhou, Jiquan Liu, Huilong Duan

{"title":"A weakly supervised method for surgical scene components detection with visual foundation model.","authors":"Xiaoyan Zhang, Jingyi Feng, Qian Zhang, Liming Wu, Yichen Zhu, Ziyu Zhou, Jiquan Liu, Huilong Duan","doi":"10.1371/journal.pone.0322751","DOIUrl":null,"url":null,"abstract":"Purpose: Detection of crucial components is a fundamental problem in surgical scene understanding. Limited by the huge cost of spatial annotation, current studies mainly focus on the recognition of three surgical elements [Formula: see text]instrument, verb, target[Formula: see text], while the detection of surgical components [Formula: see text]instrument, target[Formula: see text] remains highly challenging. Some efforts have been made to detect surgical components, yet their limitations include: (1) Detection performance highly depends on the amount of manual spatial annotations; (2) No previous study has investigated the detection of targets.Methods: We introduce a weakly supervised method for detecting key components by novelly combining the surgical triplet recognition model and the foundation model of Segment Anything Model (SAM). First, by setting appropriate prompts, we used SAM to generate candidate regions for surgical components. Then, we preliminarily localize components by extracting positive activation areas in class activation maps from the recognition model. However, using instrument's class activation as a position attention guide for target recognition leads to positional deviations in the target's resulting positive activation. To tackle this issue, we propose RDV-AGC by introducing an Attention Guide Correction (AGC) module. This module adjusts the attention guidance for target according to the instrument's forward direction. Finally, we match the initial localization of instruments and targets with the candidate areas generated by SAM, achieving precise detection of components in the surgical scene.Results: Through ablation studies and comparisons to similar works, our method has achieved remarkable performance without requiring any spatial annotations.Conclusion: This study introduced a novel weakly supervised method for detecting surgical components by integrating the surgical triplet recognition model with visual foundation model.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0322751"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0322751","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Detection of crucial components is a fundamental problem in surgical scene understanding. Limited by the huge cost of spatial annotation, current studies mainly focus on the recognition of three surgical elements [Formula: see text]instrument, verb, target[Formula: see text], while the detection of surgical components [Formula: see text]instrument, target[Formula: see text] remains highly challenging. Some efforts have been made to detect surgical components, yet their limitations include: (1) Detection performance highly depends on the amount of manual spatial annotations; (2) No previous study has investigated the detection of targets.

Methods: We introduce a weakly supervised method for detecting key components by novelly combining the surgical triplet recognition model and the foundation model of Segment Anything Model (SAM). First, by setting appropriate prompts, we used SAM to generate candidate regions for surgical components. Then, we preliminarily localize components by extracting positive activation areas in class activation maps from the recognition model. However, using instrument's class activation as a position attention guide for target recognition leads to positional deviations in the target's resulting positive activation. To tackle this issue, we propose RDV-AGC by introducing an Attention Guide Correction (AGC) module. This module adjusts the attention guidance for target according to the instrument's forward direction. Finally, we match the initial localization of instruments and targets with the candidate areas generated by SAM, achieving precise detection of components in the surgical scene.

Results: Through ablation studies and comparisons to similar works, our method has achieved remarkable performance without requiring any spatial annotations.

Conclusion: This study introduced a novel weakly supervised method for detecting surgical components by integrating the surgical triplet recognition model with visual foundation model.

查看原文本刊更多论文

基于视觉基础模型的手术场景分量检测弱监督方法。

目的：关键成分的检测是手术场景理解的一个基本问题。受空间标注成本巨大的限制，目前的研究主要集中在对手术三要素[公式：见文]器械、动词、目标[公式：见文]的识别上，而对手术成分[公式：见文]器械、目标[公式：见文]的检测仍然具有很大的挑战性。在检测手术部件方面已经做了一些努力，但它们的局限性包括：(1)检测性能高度依赖于人工空间注释的数量；(2)以前没有研究过目标的检测。方法：将外科三联体识别模型与部分任意模型（SAM）的基础模型相结合，提出一种弱监督的关键成分检测方法。首先，通过设置适当的提示，我们使用SAM生成手术部件的候选区域。然后，我们从识别模型中提取类激活图中的正激活区域，初步定位组件。然而，使用仪器的类别激活作为目标识别的位置注意引导，会导致目标的正向激活产生位置偏差。为了解决这个问题，我们提出了RDV-AGC，通过引入注意力引导校正（AGC）模块。该模块根据仪器的前进方向调整对目标的注意引导。最后，我们将仪器和目标的初始定位与SAM生成的候选区域进行匹配，实现对手术场景中组件的精确检测。结果：通过消融研究和与同类作品的对比，我们的方法在不需要任何空间注释的情况下取得了显著的性能。结论：本研究提出了一种将手术三联体识别模型与视觉基础模型相结合的弱监督检测方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage