Enhancing generalization in zero-shot multi-label endoscopic instrument classification.

IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL
Raphaela Maerkl, Tobias Rueckert, David Rauber, Max Gutbrod, Danilo Weber Nunes, Christoph Palm
{"title":"Enhancing generalization in zero-shot multi-label endoscopic instrument classification.","authors":"Raphaela Maerkl, Tobias Rueckert, David Rauber, Max Gutbrod, Danilo Weber Nunes, Christoph Palm","doi":"10.1007/s11548-025-03439-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities. This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.</p><p><strong>Methods: </strong>This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.</p><p><strong>Results: </strong>The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9 % to 64.9 %, and the multi-label accuracy from 26.1 % to 79.5 %. Overall performance measured across both seen and unseen classes improves from 49.3 % to 64.9 % in AUROC and from 37.3 % to 65.1 % in multi-label accuracy, highlighting the effectiveness of our approach.</p><p><strong>Conclusion: </strong>These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03439-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities. This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.

Methods: This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.

Results: The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9 % to 64.9 %, and the multi-label accuracy from 26.1 % to 79.5 %. Overall performance measured across both seen and unseen classes improves from 49.3 % to 64.9 % in AUROC and from 37.3 % to 65.1 % in multi-label accuracy, highlighting the effectiveness of our approach.

Conclusion: These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.

提高零射多标内镜器械分类的通用性。
目的:由于神经网络有限的泛化能力,用神经网络识别以前看不见的类是一个重大的挑战。这个问题在医疗应用等安全关键领域尤为重要,在这些领域,准确分类对于可靠性和患者安全至关重要。Zero-shot学习方法通过利用额外的语义数据来解决这一挑战,其性能在很大程度上依赖于生成的嵌入的质量。方法:这项工作研究了由句子-BERT模型生成的完整描述性句子作为类表示的使用,并与来自BERT模型的更简单的基于类别的词嵌入进行了比较。此外,还探讨了z-score归一化作为后处理步骤对这些嵌入的影响。该方法在一个多标签广义零射击学习任务上进行了评估,重点研究了微创胆囊切除术内镜图像中手术器械的识别。结果:结果表明,句子嵌入和z分数归一化相结合显著提高了模型的性能。对于未见过的类别,AUROC从43.9%提高到64.9%,多标签准确率从26.1%提高到79.5%。在可见和未见类别中测量的总体性能在AUROC中从49.3%提高到64.9%,在多标签准确度中从37.3%提高到65.1%,突出了我们方法的有效性。结论:这些发现表明句子嵌入和z分数归一化可以显著提高零次学习模型的泛化性能。然而,由于该研究是基于单一数据集的,未来的工作应该在不同的数据集和应用领域验证该方法,以建立其鲁棒性和更广泛的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Computer Assisted Radiology and Surgery
International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
CiteScore
5.90
自引率
6.70%
发文量
243
审稿时长
6-12 weeks
期刊介绍: The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信