Arya S. Rao BA , Siona Prasad BA , Richard S. Lee BS , Susan Farrell MD , Sophia McKinley MD, MED , Marc D. Succi MD
{"title":"Development and Evaluation of an Artificial Intelligence–Powered Surgical Oral Examination Simulator: A Pilot Study","authors":"Arya S. Rao BA , Siona Prasad BA , Richard S. Lee BS , Susan Farrell MD , Sophia McKinley MD, MED , Marc D. Succi MD","doi":"10.1016/j.mcpdig.2025.100241","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To develop and validate an artificial intelligence–powered platform that simulates surgical oral examinations, addressing the limitations of traditional faculty-led sessions.</div></div><div><h3>Patients and Methods</h3><div>This cross-sectional study, conducted from June 1, 2024, through December 1, 2024, comprised technical validation and educational assessment of a novel large language model (LLM)–based surgical education tool (surgery oral examination large language model [SOE-LLM]). The study involved 12 surgical clerkship students completing their core rotation at a major academic medical center. The SOE-LLM, using MIMIC-IV–derived surgical cases (acute appendicitis and pancreatitis), was implemented to simulate oral examinations. Technical validation assessed performance across 8 domains: case presentation accuracy, physical examination findings, historical detail preservation, laboratory data reporting, imaging interpretation, management decisions, and recognition of contraindicated interventions. Educational utility was evaluated using a 5-point Likert scale.</div></div><div><h3>Results</h3><div>Technical validation showed the SOE-LLM’s ability to function as a consistent oral examiner. The model accurately guided students through case presentations, responded to diagnostic questions, and provided clinically sound responses based on MIMIC-IV cases. When tested with standardized prompts, it maintained examination fidelity, requiring proper diagnostic reasoning and differentiating operative versus medical management. Student evaluations highlighted the platform’s value as an examination preparation tool (mean, 4.250; SEM, 0.1794) and its ability to create a low-stakes environment for high-stakes decision practice (mean, 4.833; SEM, 0.1124).</div></div><div><h3>Conclusion</h3><div>The SOE-LLM shows potential as a valuable tool for surgical education, offering a consistent and accessible platform for simulating oral examinations.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 3","pages":"Article 100241"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761225000483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To develop and validate an artificial intelligence–powered platform that simulates surgical oral examinations, addressing the limitations of traditional faculty-led sessions.
Patients and Methods
This cross-sectional study, conducted from June 1, 2024, through December 1, 2024, comprised technical validation and educational assessment of a novel large language model (LLM)–based surgical education tool (surgery oral examination large language model [SOE-LLM]). The study involved 12 surgical clerkship students completing their core rotation at a major academic medical center. The SOE-LLM, using MIMIC-IV–derived surgical cases (acute appendicitis and pancreatitis), was implemented to simulate oral examinations. Technical validation assessed performance across 8 domains: case presentation accuracy, physical examination findings, historical detail preservation, laboratory data reporting, imaging interpretation, management decisions, and recognition of contraindicated interventions. Educational utility was evaluated using a 5-point Likert scale.
Results
Technical validation showed the SOE-LLM’s ability to function as a consistent oral examiner. The model accurately guided students through case presentations, responded to diagnostic questions, and provided clinically sound responses based on MIMIC-IV cases. When tested with standardized prompts, it maintained examination fidelity, requiring proper diagnostic reasoning and differentiating operative versus medical management. Student evaluations highlighted the platform’s value as an examination preparation tool (mean, 4.250; SEM, 0.1794) and its ability to create a low-stakes environment for high-stakes decision practice (mean, 4.833; SEM, 0.1124).
Conclusion
The SOE-LLM shows potential as a valuable tool for surgical education, offering a consistent and accessible platform for simulating oral examinations.