Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2024-11-01 Epub Date: 2024-05-18 DOI:10.1007/s11548-024-03166-3

Joël L Lavanchy, Sanat Ramesh, Diego Dall'Alba, Cristians Gonzalez, Paolo Fiorini, Beat P Müller-Stich, Philipp C Nett, Jacques Marescaux, Didier Mutter, Nicolas Padoy

{"title":"Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery.","authors":"Joël L Lavanchy, Sanat Ramesh, Diego Dall'Alba, Cristians Gonzalez, Paolo Fiorini, Beat P Müller-Stich, Philipp C Nett, Jacques Marescaux, Didier Mutter, Nicolas Padoy","doi":"10.1007/s11548-024-03166-3","DOIUrl":null,"url":null,"abstract":"Purpose: Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers.Methods: In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70.Results: The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)).Conclusion: MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2249-2257"},"PeriodicalIF":2.3000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541311/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-024-03166-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/18 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers.

Methods: In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70.

Results: The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)).

Conclusion: MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.

Abstract Image

查看原文本刊更多论文

多中心泛化的挑战：Roux-en-Y 胃旁路手术中的阶段和步骤识别。

目的：大多数利用人工智能（AI）进行手术活动识别的研究主要集中在从小型和单一中心的手术视频数据集中识别一种类型的活动。这些模型是否能推广到其他中心仍是个未知数：在这项工作中，我们引入了一个大型多中心多活动数据集，该数据集由两个医疗中心，即法国斯特拉斯堡大学医院（StrasBypass70）和瑞士伯尔尼大学医院（BernBypass70）的 140 个腹腔镜鲁-恩-Y 胃旁路（LRYGB）手术视频（MultiBypass140）组成。该数据集已由两名获得认证的外科医生对阶段和步骤进行了全面注释。此外，我们还在 7 项实验研究中评估了不同深度学习模型在相位和步骤识别任务中的通用性和基准：(1) 在 BernBypass70 上进行训练和评估；(2) 在 StrasBypass70 上进行训练和评估；(3) 在 MultiBypass140 联合数据集上进行训练和评估；(4) 在 BernBypass70 上进行训练，在 StrasBypass70 上进行评估；(5) 在 StrasBypass70 上进行训练，在 BernBypass70 上进行评估；在 MultiBypass140 上进行训练，(6) 在 BernBypass70 上进行评估，以及 (7) 在 StrasBypass70 上进行评估。结果：模型的性能明显受到训练数据的影响。实验(4)和(5)的结果最差，这证实了在单中心数据上训练的模型的泛化能力有限。在实验（6）和（7）中使用多中心训练数据提高了模型的泛化能力，使其超过了独立的单中心训练和验证（实验（1）和（2））的水平：结论：MultiBypass140 显示，不同中心在 LRYGB 手术的手术技术和工作流程方面存在很大差异。因此，归纳实验证明了模型性能的显著差异。这些结果凸显了多中心数据集对人工智能模型泛化的重要性，以考虑手术技术和工作流程的差异。数据集和代码可在 https://github.com/CAMMA-public/MultiBypass140 公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.