Erin Kim BS , Lindsay S. Rosenthal MD, MS , C. Yoonhee Ryder MD, MS , Chioma Anidi MD, MBA , Serena S. Bidwell MD, MBA, MPH , Deborah M. Rooney PhD , Joon Yu BFA , Pawel Forczmanski PhD, DSc , David R. Jeffcoach MD , Grace J. Kim MD
{"title":"Generalizability of Artificial Intelligence Assessments in Laparoscopic Surgery Simulation","authors":"Erin Kim BS , Lindsay S. Rosenthal MD, MS , C. Yoonhee Ryder MD, MS , Chioma Anidi MD, MBA , Serena S. Bidwell MD, MBA, MPH , Deborah M. Rooney PhD , Joon Yu BFA , Pawel Forczmanski PhD, DSc , David R. Jeffcoach MD , Grace J. Kim MD","doi":"10.1016/j.jss.2025.03.030","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>The application of artificial intelligence (AI) in the assessment of procedural skills on a simulation platform using the global rating scale (GRS) has shown promise. Our team developed an open-source, low-cost simulation platform for the development of laparoscopic skills in low-resource settings, with skill assessment provided by video-based peer review and AI. The generalizability of AI trained on one procedure to evaluate general procedural skills within a single training system is unknown. This study examines the feasibility of generalizing AI-based assessments across procedures in a training system.</div></div><div><h3>Methods</h3><div>AI was trained, with varied combinations of procedures, to score 111 laparoscopic performance videos of four procedures (57 salpingostomies, 20 appendectomies, 15 enterectomies, and 19 diaphragmatic repairs), using time and distance-based calculations. Predicted scores were generated using five-fold cross-validation and K-nearest neighbors, with both 5-class (scored 1-5) and 2-class (pass/fail) scoring systems. Videos were also scored in a conventional fashion using human video-based review, based on GRS competencies.</div></div><div><h3>Results</h3><div>AI assessments achieved 42%-100% concordance with human reviews in the 5-class system and 68%-100% in the 2-class system, <em>P</em> = 0.005. Within the 5-class system, 100% accuracy was reached when AI trained on multiple procedures evaluated appendectomy. The 2-class system attained 100% accuracy in three procedures across the GRS competencies.</div></div><div><h3>Conclusions</h3><div>AI assessment trained on procedures using video-based review evaluated laparoscopic skills across different procedures within a simulation-based training system. Dichotomizing scoring to pass/fail improved accuracy, while supporting the potential to assess procedural competence.</div></div>","PeriodicalId":17030,"journal":{"name":"Journal of Surgical Research","volume":"309 ","pages":"Pages 249-256"},"PeriodicalIF":1.8000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022480425001519","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
The application of artificial intelligence (AI) in the assessment of procedural skills on a simulation platform using the global rating scale (GRS) has shown promise. Our team developed an open-source, low-cost simulation platform for the development of laparoscopic skills in low-resource settings, with skill assessment provided by video-based peer review and AI. The generalizability of AI trained on one procedure to evaluate general procedural skills within a single training system is unknown. This study examines the feasibility of generalizing AI-based assessments across procedures in a training system.
Methods
AI was trained, with varied combinations of procedures, to score 111 laparoscopic performance videos of four procedures (57 salpingostomies, 20 appendectomies, 15 enterectomies, and 19 diaphragmatic repairs), using time and distance-based calculations. Predicted scores were generated using five-fold cross-validation and K-nearest neighbors, with both 5-class (scored 1-5) and 2-class (pass/fail) scoring systems. Videos were also scored in a conventional fashion using human video-based review, based on GRS competencies.
Results
AI assessments achieved 42%-100% concordance with human reviews in the 5-class system and 68%-100% in the 2-class system, P = 0.005. Within the 5-class system, 100% accuracy was reached when AI trained on multiple procedures evaluated appendectomy. The 2-class system attained 100% accuracy in three procedures across the GRS competencies.
Conclusions
AI assessment trained on procedures using video-based review evaluated laparoscopic skills across different procedures within a simulation-based training system. Dichotomizing scoring to pass/fail improved accuracy, while supporting the potential to assess procedural competence.
期刊介绍:
The Journal of Surgical Research: Clinical and Laboratory Investigation publishes original articles concerned with clinical and laboratory investigations relevant to surgical practice and teaching. The journal emphasizes reports of clinical investigations or fundamental research bearing directly on surgical management that will be of general interest to a broad range of surgeons and surgical researchers. The articles presented need not have been the products of surgeons or of surgical laboratories.
The Journal of Surgical Research also features review articles and special articles relating to educational, research, or social issues of interest to the academic surgical community.