Development and Evaluation of an Artificial Intelligence–Powered Surgical Oral Examination Simulator: A Pilot Study

Mayo Clinic Proceedings. Digital health Pub Date : 2025-06-09 DOI:10.1016/j.mcpdig.2025.100241

Arya S. Rao BA , Siona Prasad BA , Richard S. Lee BS , Susan Farrell MD , Sophia McKinley MD, MED , Marc D. Succi MD

{"title":"Development and Evaluation of an Artificial Intelligence–Powered Surgical Oral Examination Simulator: A Pilot Study","authors":"Arya S. Rao BA , Siona Prasad BA , Richard S. Lee BS , Susan Farrell MD , Sophia McKinley MD, MED , Marc D. Succi MD","doi":"10.1016/j.mcpdig.2025.100241","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To develop and validate an artificial intelligence–powered platform that simulates surgical oral examinations, addressing the limitations of traditional faculty-led sessions.</div></div><div><h3>Patients and Methods</h3><div>This cross-sectional study, conducted from June 1, 2024, through December 1, 2024, comprised technical validation and educational assessment of a novel large language model (LLM)–based surgical education tool (surgery oral examination large language model [SOE-LLM]). The study involved 12 surgical clerkship students completing their core rotation at a major academic medical center. The SOE-LLM, using MIMIC-IV–derived surgical cases (acute appendicitis and pancreatitis), was implemented to simulate oral examinations. Technical validation assessed performance across 8 domains: case presentation accuracy, physical examination findings, historical detail preservation, laboratory data reporting, imaging interpretation, management decisions, and recognition of contraindicated interventions. Educational utility was evaluated using a 5-point Likert scale.</div></div><div><h3>Results</h3><div>Technical validation showed the SOE-LLM’s ability to function as a consistent oral examiner. The model accurately guided students through case presentations, responded to diagnostic questions, and provided clinically sound responses based on MIMIC-IV cases. When tested with standardized prompts, it maintained examination fidelity, requiring proper diagnostic reasoning and differentiating operative versus medical management. Student evaluations highlighted the platform’s value as an examination preparation tool (mean, 4.250; SEM, 0.1794) and its ability to create a low-stakes environment for high-stakes decision practice (mean, 4.833; SEM, 0.1124).</div></div><div><h3>Conclusion</h3><div>The SOE-LLM shows potential as a valuable tool for surgical education, offering a consistent and accessible platform for simulating oral examinations.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 3","pages":"Article 100241"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761225000483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

To develop and validate an artificial intelligence–powered platform that simulates surgical oral examinations, addressing the limitations of traditional faculty-led sessions.

Patients and Methods

This cross-sectional study, conducted from June 1, 2024, through December 1, 2024, comprised technical validation and educational assessment of a novel large language model (LLM)–based surgical education tool (surgery oral examination large language model [SOE-LLM]). The study involved 12 surgical clerkship students completing their core rotation at a major academic medical center. The SOE-LLM, using MIMIC-IV–derived surgical cases (acute appendicitis and pancreatitis), was implemented to simulate oral examinations. Technical validation assessed performance across 8 domains: case presentation accuracy, physical examination findings, historical detail preservation, laboratory data reporting, imaging interpretation, management decisions, and recognition of contraindicated interventions. Educational utility was evaluated using a 5-point Likert scale.

Results

Technical validation showed the SOE-LLM’s ability to function as a consistent oral examiner. The model accurately guided students through case presentations, responded to diagnostic questions, and provided clinically sound responses based on MIMIC-IV cases. When tested with standardized prompts, it maintained examination fidelity, requiring proper diagnostic reasoning and differentiating operative versus medical management. Student evaluations highlighted the platform’s value as an examination preparation tool (mean, 4.250; SEM, 0.1794) and its ability to create a low-stakes environment for high-stakes decision practice (mean, 4.833; SEM, 0.1124).

Conclusion

The SOE-LLM shows potential as a valuable tool for surgical education, offering a consistent and accessible platform for simulating oral examinations.

查看原文本刊更多论文

人工智能驱动的外科口腔检查模拟器的开发和评估：一项试点研究

目的开发和验证一个人工智能驱动的模拟外科口腔检查平台，解决传统教师主导会议的局限性。患者和方法本横断面研究于2024年6月1日至2024年12月1日进行，包括对一种新型基于大语言模型（LLM）的外科教育工具（外科口语考试大语言模型[SOE-LLM]）的技术验证和教育评估。这项研究涉及12名在一家主要学术医疗中心完成核心轮转的外科见习学生。使用mimic - iv衍生的手术病例（急性阑尾炎和胰腺炎），实施oe - llm来模拟口腔检查。技术验证评估了8个领域的表现：病例报告的准确性、体格检查结果、历史细节保存、实验室数据报告、成像解释、管理决策和对禁忌干预措施的识别。教育效用采用5分李克特量表进行评估。结果技术验证表明，SOE-LLM作为一个一致的口头考官的能力。该模型通过病例报告准确地指导学生，回答诊断问题，并根据MIMIC-IV病例提供临床合理的反应。当使用标准化提示进行测试时，它保持了检查的保真度，需要适当的诊断推理和区分手术与医疗管理。学生评价突出了该平台作为备考工具的价值(平均4.250分；SEM, 0.1794)及其为高风险决策实践创造低风险环境的能力(平均值，4.833；SEM, 0.1124)。结论SOE-LLM为模拟口腔考试提供了一个一致的、可访问的平台，具有作为外科教育有价值的工具的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy

自引率

0.00%

发文量

审稿时长

47 days