Enhancing generalization in zero-shot multi-label endoscopic instrument classification.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-06-11 DOI:10.1007/s11548-025-03439-5

Raphaela Maerkl, Tobias Rueckert, David Rauber, Max Gutbrod, Danilo Weber Nunes, Christoph Palm

{"title":"Enhancing generalization in zero-shot multi-label endoscopic instrument classification.","authors":"Raphaela Maerkl, Tobias Rueckert, David Rauber, Max Gutbrod, Danilo Weber Nunes, Christoph Palm","doi":"10.1007/s11548-025-03439-5","DOIUrl":null,"url":null,"abstract":"Purpose: Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities. This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.Methods: This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.Results: The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9 % to 64.9 %, and the multi-label accuracy from 26.1 % to 79.5 %. Overall performance measured across both seen and unseen classes improves from 49.3 % to 64.9 % in AUROC and from 37.3 % to 65.1 % in multi-label accuracy, highlighting the effectiveness of our approach.Conclusion: These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03439-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities. This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.

Methods: This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.

Results: The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9 % to 64.9 %, and the multi-label accuracy from 26.1 % to 79.5 %. Overall performance measured across both seen and unseen classes improves from 49.3 % to 64.9 % in AUROC and from 37.3 % to 65.1 % in multi-label accuracy, highlighting the effectiveness of our approach.

Conclusion: These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.

查看原文本刊更多论文

提高零射多标内镜器械分类的通用性。

目的：由于神经网络有限的泛化能力，用神经网络识别以前看不见的类是一个重大的挑战。这个问题在医疗应用等安全关键领域尤为重要，在这些领域，准确分类对于可靠性和患者安全至关重要。Zero-shot学习方法通过利用额外的语义数据来解决这一挑战，其性能在很大程度上依赖于生成的嵌入的质量。方法：这项工作研究了由句子-BERT模型生成的完整描述性句子作为类表示的使用，并与来自BERT模型的更简单的基于类别的词嵌入进行了比较。此外，还探讨了z-score归一化作为后处理步骤对这些嵌入的影响。该方法在一个多标签广义零射击学习任务上进行了评估，重点研究了微创胆囊切除术内镜图像中手术器械的识别。结果：结果表明，句子嵌入和z分数归一化相结合显著提高了模型的性能。对于未见过的类别，AUROC从43.9%提高到64.9%，多标签准确率从26.1%提高到79.5%。在可见和未见类别中测量的总体性能在AUROC中从49.3%提高到64.9%，在多标签准确度中从37.3%提高到65.1%，突出了我们方法的有效性。结论：这些发现表明句子嵌入和z分数归一化可以显著提高零次学习模型的泛化性能。然而，由于该研究是基于单一数据集的，未来的工作应该在不同的数据集和应用领域验证该方法，以建立其鲁棒性和更广泛的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.