Development of a Clinical Clerkship Mentor Using Generative AI and Evaluation of Its Effectiveness in a Medical Student Trial Compared to Student Mentors: 2-Part Comparative Study.
{"title":"Development of a Clinical Clerkship Mentor Using Generative AI and Evaluation of Its Effectiveness in a Medical Student Trial Compared to Student Mentors: 2-Part Comparative Study.","authors":"Hayato Ebihara, Hajime Kasai, Ikuo Shimizu, Kiyoshi Shikino, Hiroshi Tajima, Yasuhiko Kimura, Shoichi Ito","doi":"10.2196/76702","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>At the beginning of their clinical clerkships (CCs), medical students face multiple challenges related to acquiring clinical and communication skills, building professional relationships, and managing psychological stress. While mentoring and structured feedback are known to provide critical support, existing systems may not offer sufficient and timely guidance owing to the faculty's limited availability. Generative artificial intelligence, particularly large language models, offers new opportunities to support medical education by providing context-sensitive responses.</p><p><strong>Objective: </strong>This study aimed to develop a generative artificial intelligence CC mentor (AI-CCM) based on ChatGPT and evaluate its effectiveness in supporting medical students' clinical learning, addressing their concerns, and supplementing human mentoring. The secondary objective was to compare AI-CCM's educational value with responses from senior student mentors.</p><p><strong>Methods: </strong>We conducted 2 studies. In study 1, we created 5 scenarios based on challenges that students commonly encountered during CCs. For each scenario, 5 senior student mentors and AI-CCM generated written advice. Five medical education experts evaluated these responses using a rubric to assess accuracy, practical utility, educational appropriateness (5-point Likert scale), and safety (binary scale). In study 2, a total of 17 fourth-year medical students used AI-CCM for 1 week during their CCs and completed a questionnaire evaluating its usefulness, clarity, emotional support, and impact on communication and learning (5-point Likert scale) informed by the technology acceptance model.</p><p><strong>Results: </strong>All results indicated that AI-CCM achieved higher mean scores than senior student mentors. AI-CCM responses were rated higher in educational appropriateness (4.2, SD 0.7 vs 3.8, SD 1.0; P=.001). No significant differences with senior student mentors were observed in accuracy (4.4, SD 0.7 vs 4.2, SD 0.9; P=.11) or practical utility (4.1, SD 0.7 vs 4.0, SD 0.9; P=.35). No safety concerns were identified in AI-CCM responses, whereas 2 concerns were noted in student mentors' responses. Scenario-specific analysis revealed that AI-CCM performed substantially better in emotional and psychological stress scenarios. In the student trial, AI-CCM was rated as moderately useful (mean usefulness score 3.9, SD 1.1), with positive evaluations for clarity (4.0, SD 0.9) and emotional support (3.8, SD 1.1). However, aspects related to feedback guidance (2.9, SD 0.9) and anxiety reduction (3.2, SD 1.0) received more neutral ratings. Students primarily consulted AI-CCM regarding learning workload and communication difficulties; few students used it to address emotional stress-related issues.</p><p><strong>Conclusions: </strong>AI-CCM has the potential to serve as a supplementary educational partner during CCs, offering comparable support to that of senior student mentors in structured scenarios. Despite challenges of response latency and limited depth in clinical content, AI-CCM was received well by and accessible to students who used ChatGPT's free version. With further refinements, including specialty-specific content and improved responsiveness, AI-CCM may serve as a scalable, context-sensitive support system in clinical medical education.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e76702"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12447005/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/76702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: At the beginning of their clinical clerkships (CCs), medical students face multiple challenges related to acquiring clinical and communication skills, building professional relationships, and managing psychological stress. While mentoring and structured feedback are known to provide critical support, existing systems may not offer sufficient and timely guidance owing to the faculty's limited availability. Generative artificial intelligence, particularly large language models, offers new opportunities to support medical education by providing context-sensitive responses.
Objective: This study aimed to develop a generative artificial intelligence CC mentor (AI-CCM) based on ChatGPT and evaluate its effectiveness in supporting medical students' clinical learning, addressing their concerns, and supplementing human mentoring. The secondary objective was to compare AI-CCM's educational value with responses from senior student mentors.
Methods: We conducted 2 studies. In study 1, we created 5 scenarios based on challenges that students commonly encountered during CCs. For each scenario, 5 senior student mentors and AI-CCM generated written advice. Five medical education experts evaluated these responses using a rubric to assess accuracy, practical utility, educational appropriateness (5-point Likert scale), and safety (binary scale). In study 2, a total of 17 fourth-year medical students used AI-CCM for 1 week during their CCs and completed a questionnaire evaluating its usefulness, clarity, emotional support, and impact on communication and learning (5-point Likert scale) informed by the technology acceptance model.
Results: All results indicated that AI-CCM achieved higher mean scores than senior student mentors. AI-CCM responses were rated higher in educational appropriateness (4.2, SD 0.7 vs 3.8, SD 1.0; P=.001). No significant differences with senior student mentors were observed in accuracy (4.4, SD 0.7 vs 4.2, SD 0.9; P=.11) or practical utility (4.1, SD 0.7 vs 4.0, SD 0.9; P=.35). No safety concerns were identified in AI-CCM responses, whereas 2 concerns were noted in student mentors' responses. Scenario-specific analysis revealed that AI-CCM performed substantially better in emotional and psychological stress scenarios. In the student trial, AI-CCM was rated as moderately useful (mean usefulness score 3.9, SD 1.1), with positive evaluations for clarity (4.0, SD 0.9) and emotional support (3.8, SD 1.1). However, aspects related to feedback guidance (2.9, SD 0.9) and anxiety reduction (3.2, SD 1.0) received more neutral ratings. Students primarily consulted AI-CCM regarding learning workload and communication difficulties; few students used it to address emotional stress-related issues.
Conclusions: AI-CCM has the potential to serve as a supplementary educational partner during CCs, offering comparable support to that of senior student mentors in structured scenarios. Despite challenges of response latency and limited depth in clinical content, AI-CCM was received well by and accessible to students who used ChatGPT's free version. With further refinements, including specialty-specific content and improved responsiveness, AI-CCM may serve as a scalable, context-sensitive support system in clinical medical education.