Yavuz Selim Kıyak , Emre Emekli , Tuğba İş Kara , Özlem Coşkun , Işıl İrem Budakoğlu
{"title":"人工智能教授医学生外科诊断推理:使用全自动低成本反馈系统的实验证据","authors":"Yavuz Selim Kıyak , Emre Emekli , Tuğba İş Kara , Özlem Coşkun , Işıl İrem Budakoğlu","doi":"10.1016/j.jsurg.2025.103639","DOIUrl":null,"url":null,"abstract":"<div><h3>OBJECTIVE</h3><div>While AI-generated feedback has shown promise in medical education, prior studies have only used AI for feedback, with question design handled by human experts, and the process required human involvement. This study aimed to evaluate the effectiveness of a fully automated AI-based system that generates both multiple-choice questions (MCQs) and personalized feedback, without any human input, on improving diagnostic reasoning in preclinical medical students.</div></div><div><h3>DESIGN</h3><div>A prospective, parallel-group, interventional study. The intervention group (Year-1 students) received AI-generated MCQs and feedback over 5 days using a web platform, coded via “vibe coding,” with spaced repetition. The diagnoses covered included 5 abdominal pain conditions: acute appendicitis, acute cholecystitis, acute pancreatitis, acute gastroenteritis, and nephrolithiasis. Diagnostic performance was assessed via an Objective Structured Video Examination (OSVE), immediately and 2 weeks postintervention. The control group (Year-2 students) completed the OSVE once.</div></div><div><h3>SETTING</h3><div>Gazi University Faculty of Medicine, Ankara, Turkiye; institutional academic setting focused on undergraduate medical education.</div></div><div><h3>PARTICIPANTS</h3><div>Thirty-eight Year-1 medical students completed the intervention. Thirty-three Year-2 students served as a non-randomized control group. All intervention participants completed the immediate assessment; 30 completed the delayed assessment.</div></div><div><h3>RESULTS</h3><div>Intervention participants outperformed the control group in diagnosing the 5 abdominal pain conditions immediately after the intervention (p < 0.001) and at the 2-week follow-up (p < 0.001). Postintervention expert review confirmed the accuracy of all AI-generated questions and identified minimal issues in 0.6% of feedback statements. Total AI cost was $0.51.</div></div><div><h3>CONCLUSIONS</h3><div>A fully automated, low-cost AI system without human in the loop during content generation can significantly enhance illness scripts in preclinical medical students. Early engagement with such tools may help students strengthen surgical diagnostic skills and derive greater benefit from clinical environments. This kind of tools may transform how clinical reasoning is taught in resource-limited or high-volume educational settings.</div></div>","PeriodicalId":50033,"journal":{"name":"Journal of Surgical Education","volume":"82 10","pages":"Article 103639"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI Teaches Surgical Diagnostic Reasoning to Medical Students: Evidence from an Experiment Using a Fully Automated, Low-Cost Feedback System\",\"authors\":\"Yavuz Selim Kıyak , Emre Emekli , Tuğba İş Kara , Özlem Coşkun , Işıl İrem Budakoğlu\",\"doi\":\"10.1016/j.jsurg.2025.103639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>OBJECTIVE</h3><div>While AI-generated feedback has shown promise in medical education, prior studies have only used AI for feedback, with question design handled by human experts, and the process required human involvement. This study aimed to evaluate the effectiveness of a fully automated AI-based system that generates both multiple-choice questions (MCQs) and personalized feedback, without any human input, on improving diagnostic reasoning in preclinical medical students.</div></div><div><h3>DESIGN</h3><div>A prospective, parallel-group, interventional study. The intervention group (Year-1 students) received AI-generated MCQs and feedback over 5 days using a web platform, coded via “vibe coding,” with spaced repetition. The diagnoses covered included 5 abdominal pain conditions: acute appendicitis, acute cholecystitis, acute pancreatitis, acute gastroenteritis, and nephrolithiasis. Diagnostic performance was assessed via an Objective Structured Video Examination (OSVE), immediately and 2 weeks postintervention. The control group (Year-2 students) completed the OSVE once.</div></div><div><h3>SETTING</h3><div>Gazi University Faculty of Medicine, Ankara, Turkiye; institutional academic setting focused on undergraduate medical education.</div></div><div><h3>PARTICIPANTS</h3><div>Thirty-eight Year-1 medical students completed the intervention. Thirty-three Year-2 students served as a non-randomized control group. All intervention participants completed the immediate assessment; 30 completed the delayed assessment.</div></div><div><h3>RESULTS</h3><div>Intervention participants outperformed the control group in diagnosing the 5 abdominal pain conditions immediately after the intervention (p < 0.001) and at the 2-week follow-up (p < 0.001). Postintervention expert review confirmed the accuracy of all AI-generated questions and identified minimal issues in 0.6% of feedback statements. Total AI cost was $0.51.</div></div><div><h3>CONCLUSIONS</h3><div>A fully automated, low-cost AI system without human in the loop during content generation can significantly enhance illness scripts in preclinical medical students. Early engagement with such tools may help students strengthen surgical diagnostic skills and derive greater benefit from clinical environments. This kind of tools may transform how clinical reasoning is taught in resource-limited or high-volume educational settings.</div></div>\",\"PeriodicalId\":50033,\"journal\":{\"name\":\"Journal of Surgical Education\",\"volume\":\"82 10\",\"pages\":\"Article 103639\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Surgical Education\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S193172042500220X\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Education","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S193172042500220X","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
AI Teaches Surgical Diagnostic Reasoning to Medical Students: Evidence from an Experiment Using a Fully Automated, Low-Cost Feedback System
OBJECTIVE
While AI-generated feedback has shown promise in medical education, prior studies have only used AI for feedback, with question design handled by human experts, and the process required human involvement. This study aimed to evaluate the effectiveness of a fully automated AI-based system that generates both multiple-choice questions (MCQs) and personalized feedback, without any human input, on improving diagnostic reasoning in preclinical medical students.
DESIGN
A prospective, parallel-group, interventional study. The intervention group (Year-1 students) received AI-generated MCQs and feedback over 5 days using a web platform, coded via “vibe coding,” with spaced repetition. The diagnoses covered included 5 abdominal pain conditions: acute appendicitis, acute cholecystitis, acute pancreatitis, acute gastroenteritis, and nephrolithiasis. Diagnostic performance was assessed via an Objective Structured Video Examination (OSVE), immediately and 2 weeks postintervention. The control group (Year-2 students) completed the OSVE once.
SETTING
Gazi University Faculty of Medicine, Ankara, Turkiye; institutional academic setting focused on undergraduate medical education.
PARTICIPANTS
Thirty-eight Year-1 medical students completed the intervention. Thirty-three Year-2 students served as a non-randomized control group. All intervention participants completed the immediate assessment; 30 completed the delayed assessment.
RESULTS
Intervention participants outperformed the control group in diagnosing the 5 abdominal pain conditions immediately after the intervention (p < 0.001) and at the 2-week follow-up (p < 0.001). Postintervention expert review confirmed the accuracy of all AI-generated questions and identified minimal issues in 0.6% of feedback statements. Total AI cost was $0.51.
CONCLUSIONS
A fully automated, low-cost AI system without human in the loop during content generation can significantly enhance illness scripts in preclinical medical students. Early engagement with such tools may help students strengthen surgical diagnostic skills and derive greater benefit from clinical environments. This kind of tools may transform how clinical reasoning is taught in resource-limited or high-volume educational settings.
期刊介绍:
The Journal of Surgical Education (JSE) is dedicated to advancing the field of surgical education through original research. The journal publishes research articles in all surgical disciplines on topics relative to the education of surgical students, residents, and fellows, as well as practicing surgeons. Our readers look to JSE for timely, innovative research findings from the international surgical education community. As the official journal of the Association of Program Directors in Surgery (APDS), JSE publishes the proceedings of the annual APDS meeting held during Surgery Education Week.