Generalizability of Artificial Intelligence Assessments in Laparoscopic Surgery Simulation

IF 1.8 3区医学 Q2 SURGERY

Journal of Surgical Research Pub Date : 2025-04-24 DOI:10.1016/j.jss.2025.03.030

Erin Kim BS , Lindsay S. Rosenthal MD, MS , C. Yoonhee Ryder MD, MS , Chioma Anidi MD, MBA , Serena S. Bidwell MD, MBA, MPH , Deborah M. Rooney PhD , Joon Yu BFA , Pawel Forczmanski PhD, DSc , David R. Jeffcoach MD , Grace J. Kim MD

{"title":"Generalizability of Artificial Intelligence Assessments in Laparoscopic Surgery Simulation","authors":"Erin Kim BS , Lindsay S. Rosenthal MD, MS , C. Yoonhee Ryder MD, MS , Chioma Anidi MD, MBA , Serena S. Bidwell MD, MBA, MPH , Deborah M. Rooney PhD , Joon Yu BFA , Pawel Forczmanski PhD, DSc , David R. Jeffcoach MD , Grace J. Kim MD","doi":"10.1016/j.jss.2025.03.030","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>The application of artificial intelligence (AI) in the assessment of procedural skills on a simulation platform using the global rating scale (GRS) has shown promise. Our team developed an open-source, low-cost simulation platform for the development of laparoscopic skills in low-resource settings, with skill assessment provided by video-based peer review and AI. The generalizability of AI trained on one procedure to evaluate general procedural skills within a single training system is unknown. This study examines the feasibility of generalizing AI-based assessments across procedures in a training system.</div></div><div><h3>Methods</h3><div>AI was trained, with varied combinations of procedures, to score 111 laparoscopic performance videos of four procedures (57 salpingostomies, 20 appendectomies, 15 enterectomies, and 19 diaphragmatic repairs), using time and distance-based calculations. Predicted scores were generated using five-fold cross-validation and K-nearest neighbors, with both 5-class (scored 1-5) and 2-class (pass/fail) scoring systems. Videos were also scored in a conventional fashion using human video-based review, based on GRS competencies.</div></div><div><h3>Results</h3><div>AI assessments achieved 42%-100% concordance with human reviews in the 5-class system and 68%-100% in the 2-class system, <em>P</em> = 0.005. Within the 5-class system, 100% accuracy was reached when AI trained on multiple procedures evaluated appendectomy. The 2-class system attained 100% accuracy in three procedures across the GRS competencies.</div></div><div><h3>Conclusions</h3><div>AI assessment trained on procedures using video-based review evaluated laparoscopic skills across different procedures within a simulation-based training system. Dichotomizing scoring to pass/fail improved accuracy, while supporting the potential to assess procedural competence.</div></div>","PeriodicalId":17030,"journal":{"name":"Journal of Surgical Research","volume":"309 ","pages":"Pages 249-256"},"PeriodicalIF":1.8000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022480425001519","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

The application of artificial intelligence (AI) in the assessment of procedural skills on a simulation platform using the global rating scale (GRS) has shown promise. Our team developed an open-source, low-cost simulation platform for the development of laparoscopic skills in low-resource settings, with skill assessment provided by video-based peer review and AI. The generalizability of AI trained on one procedure to evaluate general procedural skills within a single training system is unknown. This study examines the feasibility of generalizing AI-based assessments across procedures in a training system.

Methods

AI was trained, with varied combinations of procedures, to score 111 laparoscopic performance videos of four procedures (57 salpingostomies, 20 appendectomies, 15 enterectomies, and 19 diaphragmatic repairs), using time and distance-based calculations. Predicted scores were generated using five-fold cross-validation and K-nearest neighbors, with both 5-class (scored 1-5) and 2-class (pass/fail) scoring systems. Videos were also scored in a conventional fashion using human video-based review, based on GRS competencies.

Results

AI assessments achieved 42%-100% concordance with human reviews in the 5-class system and 68%-100% in the 2-class system, P = 0.005. Within the 5-class system, 100% accuracy was reached when AI trained on multiple procedures evaluated appendectomy. The 2-class system attained 100% accuracy in three procedures across the GRS competencies.

Conclusions

AI assessment trained on procedures using video-based review evaluated laparoscopic skills across different procedures within a simulation-based training system. Dichotomizing scoring to pass/fail improved accuracy, while supporting the potential to assess procedural competence.

查看原文本刊更多论文

人工智能评估在腹腔镜手术模拟中的推广应用

导言：人工智能（AI）在模拟平台上使用全球评分量表（GRS）评估手术技能的应用前景广阔。我们的团队开发了一个开源、低成本的模拟平台，用于在资源匮乏的环境中发展腹腔镜技能，并通过基于视频的同行评议和人工智能提供技能评估。在单个培训系统中，针对一种手术进行培训的人工智能是否可用于评估一般手术技能，目前尚不清楚。本研究探讨了在一个培训系统中对基于人工智能的评估进行跨手术通用化的可行性。方法对人工智能进行了培训，使用不同的手术组合，通过基于时间和距离的计算，对四种手术（57 例输卵管造口术、20 例阑尾切除术、15 例内膜切除术和 19 例膈肌修补术）的 111 个腹腔镜手术视频进行评分。使用五倍交叉验证和 K 最近邻法生成预测分数，并采用 5 级（1-5 分）和 2 级（及格/不及格）评分系统。在 5 级系统中，人工智能评估与人工审核的吻合度为 42%-100%，在 2 级系统中，吻合度为 68%-100%，P = 0.005。在 5 级系统中，当人工智能在多个程序上接受培训后评估阑尾切除术时，准确率达到 100%。结论在基于模拟的培训系统中，使用基于视频审查的程序进行培训的人工智能评估对不同程序的腹腔镜技能进行了评估。将评分二分法分为合格/不合格，提高了准确性，同时支持评估手术能力的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Surgical Research 医学-外科

CiteScore

3.90

自引率

4.50%

发文量

627

审稿时长

138 days

期刊介绍： The Journal of Surgical Research: Clinical and Laboratory Investigation publishes original articles concerned with clinical and laboratory investigations relevant to surgical practice and teaching. The journal emphasizes reports of clinical investigations or fundamental research bearing directly on surgical management that will be of general interest to a broad range of surgeons and surgical researchers. The articles presented need not have been the products of surgeons or of surgical laboratories. The Journal of Surgical Research also features review articles and special articles relating to educational, research, or social issues of interest to the academic surgical community.