Real-time surgical tool detection with multi-scale positional encoding and contrastive learning

IF 2.8 Q3 ENGINEERING, BIOMEDICAL
Gerardo Loza, Pietro Valdastri, Sharib Ali
{"title":"Real-time surgical tool detection with multi-scale positional encoding and contrastive learning","authors":"Gerardo Loza,&nbsp;Pietro Valdastri,&nbsp;Sharib Ali","doi":"10.1049/htl2.12060","DOIUrl":null,"url":null,"abstract":"<p>Real-time detection of surgical tools in laparoscopic data plays a vital role in understanding surgical procedures, evaluating the performance of trainees, facilitating learning, and ultimately supporting the autonomy of robotic systems. Existing detection methods for surgical data need to improve processing speed and high prediction accuracy. Most methods rely on anchors or region proposals, limiting their adaptability to variations in tool appearance and leading to sub-optimal detection results. Moreover, using non-anchor-based detectors to alleviate this problem has been partially explored without remarkable results. An anchor-free architecture based on a transformer that allows real-time tool detection is introduced. The proposal is to utilize multi-scale features within the feature extraction layer and at the transformer-based detection architecture through positional encoding that can refine and capture context-aware and structural information of different-sized tools. Furthermore, a supervised contrastive loss is introduced to optimize representations of object embeddings, resulting in improved feed-forward network performances for classifying localized bounding boxes. The strategy demonstrates superiority to state-of-the-art (SOTA) methods. Compared to the most accurate existing SOTA (DSSS) method, the approach has an improvement of nearly 4% on mAP<span></span><math>\n <semantics>\n <msub>\n <mrow></mrow>\n <mn>50</mn>\n </msub>\n <annotation>$_{50}$</annotation>\n </semantics></math> and a reduction in the inference time by 113%. It also showed a 7% higher mAP<span></span><math>\n <semantics>\n <msub>\n <mrow></mrow>\n <mn>50</mn>\n </msub>\n <annotation>$_{50}$</annotation>\n </semantics></math> than the baseline model.</p>","PeriodicalId":37474,"journal":{"name":"Healthcare Technology Letters","volume":"11 2-3","pages":"48-58"},"PeriodicalIF":2.8000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/htl2.12060","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/htl2.12060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Real-time detection of surgical tools in laparoscopic data plays a vital role in understanding surgical procedures, evaluating the performance of trainees, facilitating learning, and ultimately supporting the autonomy of robotic systems. Existing detection methods for surgical data need to improve processing speed and high prediction accuracy. Most methods rely on anchors or region proposals, limiting their adaptability to variations in tool appearance and leading to sub-optimal detection results. Moreover, using non-anchor-based detectors to alleviate this problem has been partially explored without remarkable results. An anchor-free architecture based on a transformer that allows real-time tool detection is introduced. The proposal is to utilize multi-scale features within the feature extraction layer and at the transformer-based detection architecture through positional encoding that can refine and capture context-aware and structural information of different-sized tools. Furthermore, a supervised contrastive loss is introduced to optimize representations of object embeddings, resulting in improved feed-forward network performances for classifying localized bounding boxes. The strategy demonstrates superiority to state-of-the-art (SOTA) methods. Compared to the most accurate existing SOTA (DSSS) method, the approach has an improvement of nearly 4% on mAP 50 $_{50}$ and a reduction in the inference time by 113%. It also showed a 7% higher mAP 50 $_{50}$ than the baseline model.

Abstract Image

利用多尺度位置编码和对比学习实时检测手术工具
腹腔镜数据中手术工具的实时检测在理解手术过程、评估受训者的表现、促进学习以及最终支持机器人系统的自主性方面起着至关重要的作用。现有的手术数据检测方法需要提高处理速度和预测精度。大多数方法依赖于锚点或区域建议,限制了它们对工具外观变化的适应性,导致次优检测结果。此外,使用非锚基探测器来缓解这个问题已经进行了部分探索,但没有显著的结果。介绍了一种基于变压器的无锚结构,可实现实时工具检测。该建议是利用特征提取层和基于变压器的检测架构中的多尺度特征,通过位置编码可以细化和捕获不同尺寸工具的上下文感知和结构信息。此外,引入了监督对比损失来优化对象嵌入的表示,从而提高了局部边界盒分类的前馈网络性能。该策略显示出优于最先进(SOTA)方法的优势。与现有最精确的SOTA (DSSS)方法相比,该方法在mAP50上提高了近4%,推理时间减少了113%。它还显示mAP50比基线模型高7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Healthcare Technology Letters
Healthcare Technology Letters Health Professions-Health Information Management
CiteScore
6.10
自引率
4.80%
发文量
12
审稿时长
22 weeks
期刊介绍: Healthcare Technology Letters aims to bring together an audience of biomedical and electrical engineers, physical and computer scientists, and mathematicians to enable the exchange of the latest ideas and advances through rapid online publication of original healthcare technology research. Major themes of the journal include (but are not limited to): Major technological/methodological areas: Biomedical signal processing Biomedical imaging and image processing Bioinstrumentation (sensors, wearable technologies, etc) Biomedical informatics Major application areas: Cardiovascular and respiratory systems engineering Neural engineering, neuromuscular systems Rehabilitation engineering Bio-robotics, surgical planning and biomechanics Therapeutic and diagnostic systems, devices and technologies Clinical engineering Healthcare information systems, telemedicine, mHealth.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信