Can large language models replace standardised patients?

IF 4.9 1区教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Medical Education Pub Date : 2025-03-06 DOI:10.1111/medu.15641

Weipeng Han, Xiaohong Lyu, Ji-Jiang Yang, Mengsha Yan, Yuelun Zhang, Tingyan Wang, Hui Pan, Shi Chen, Jiming Zhu, Xiaoming Huang

{"title":"Can large language models replace standardised patients?","authors":"Weipeng Han, Xiaohong Lyu, Ji-Jiang Yang, Mengsha Yan, Yuelun Zhang, Tingyan Wang, Hui Pan, Shi Chen, Jiming Zhu, Xiaoming Huang","doi":"10.1111/medu.15641","DOIUrl":null,"url":null,"abstract":"Standardised patients (SPs) play a crucial role in medical education by allowing students to practice diagnostic skills in a risk-free environment. This not only boosts their confidence but also provides them with immediate feedback. However, despite their importance in training medical professionals, the deployment and integration of SPs into educational systems in developing regions face significant obstacles.1 These include the high cost of training, varying levels of medical education and socio-cultural differences. The emergence of large language models (LLMs) has further catalysed transformations in medical education. Evaluating the effectiveness and reliability of LLMs as substitutes for SPs is especially important in regions with limited medical resources.To evaluate the viability of LLMs as SPs, we designed a study where LLMs were prompted to simulate SPs. The process involved transcribing video recordings of clinical student encounters with a human SP into text, which resulted in a dataset of 6600 questions and answers. Open-source and closed-source LLMs were tested, and their performance was evaluated by independent expert clinical physicians in a blinded manner. Based on the evaluations, we developed a teaching system powered by the most capable LLM. All participants completed two sequentially administered standardised clinical examinations using a repeated-measures design, first with human SPs and then with LLM-simulated SPs. A questionnaire survey, developed through expert consultation and group discussions, was used to assess students' experiences, focusing on exam difficulty, psychological feelings and the effectiveness of role-play.Utilising LLMs in the role of SPs has generated significant interest and enthusiasm among educators and learners. Additionally, this approach has underscored the vast potential of artificial intelligence in reshaping the landscape of medical education. The expertise and availability of SPs represent a precious resource, and the integration of LLMs can enhance the scope of SP-based instructional resources.Currently, LLMs are effectively utilised to augment the instructional approach of SPs. They facilitate both pre- and post-practice review sessions with SPs, thereby enhancing the number of training instances available to students. The blind test indicated that two LLMs scored higher than SPs. However, survey results revealed that students' ratings of SPs exceeded those of LLMs in terms of examination difficulty and role-play assessment. SPs were found to be less effective than LLMs in students' psychological experiences and no significant differences in process experiences. LLMs and SPs each have unique strengths, making LLMs a valuable supplement to, rather than a replacement for, SPs. The advantage of LLMs lies in their ability to conduct simulated consultations anytime and anywhere, helping students feel more relaxed and confident. In contrast, students rated SPs higher in terms of examination difficulty and role-play assessment. This feedback underscores the nuanced and complex role that SPs play in medical education, aspects that current LLMs may not fully replicate. Future research should investigate the role-playing capabilities of LLMs as SPs across multiple languages and explore methods to enhance their performance, such as supervised fine-tuning and continued pre-training. Additionally, developing embodied agents that leverage LLMs represents a significant advancement in medical education methodologies.Weipeng Han, Xiaohong Lyu, and Ji-Jiang Yang contributed equally to this work, including results interpretation and manuscript preparation, and share co-first authorship. Weipeng Han contributed significantly to the conceptualisation and design of the experiments and methodology. Xiaohong Lyu was instrumental in data collection and analysis. Ji-Jiang Yang was pivotal in devising the software design and bringing it to fruition. Shi Chen, Jiming Zhu, and Xiaoming Huang are credited as co-corresponding authors for their joint supervision of the entire research process, and collaborative leadership. Mengsha Yan performed critical review and editing of the manuscript while providing essential research resources. Yuelun Zhang conducted formal data analysis and contributed to manuscript refinement through editorial review. Tingyan Wang managed data curation, developed software tools, and participated in editorial revisions of the paper. Hui Pan oversaw research supervision and provided expert guidance during the manuscript's review and editing phases.Ethics approval for the study was obtained through the Peking Union Medical College Hospital Ethics Committee (K4899).","PeriodicalId":18370,"journal":{"name":"Medical Education","volume":"59 5","pages":"552-553"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/medu.15641","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Education","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/medu.15641","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Standardised patients (SPs) play a crucial role in medical education by allowing students to practice diagnostic skills in a risk-free environment. This not only boosts their confidence but also provides them with immediate feedback. However, despite their importance in training medical professionals, the deployment and integration of SPs into educational systems in developing regions face significant obstacles.¹ These include the high cost of training, varying levels of medical education and socio-cultural differences. The emergence of large language models (LLMs) has further catalysed transformations in medical education. Evaluating the effectiveness and reliability of LLMs as substitutes for SPs is especially important in regions with limited medical resources.

To evaluate the viability of LLMs as SPs, we designed a study where LLMs were prompted to simulate SPs. The process involved transcribing video recordings of clinical student encounters with a human SP into text, which resulted in a dataset of 6600 questions and answers. Open-source and closed-source LLMs were tested, and their performance was evaluated by independent expert clinical physicians in a blinded manner. Based on the evaluations, we developed a teaching system powered by the most capable LLM. All participants completed two sequentially administered standardised clinical examinations using a repeated-measures design, first with human SPs and then with LLM-simulated SPs. A questionnaire survey, developed through expert consultation and group discussions, was used to assess students' experiences, focusing on exam difficulty, psychological feelings and the effectiveness of role-play.

Utilising LLMs in the role of SPs has generated significant interest and enthusiasm among educators and learners. Additionally, this approach has underscored the vast potential of artificial intelligence in reshaping the landscape of medical education. The expertise and availability of SPs represent a precious resource, and the integration of LLMs can enhance the scope of SP-based instructional resources.

Currently, LLMs are effectively utilised to augment the instructional approach of SPs. They facilitate both pre- and post-practice review sessions with SPs, thereby enhancing the number of training instances available to students. The blind test indicated that two LLMs scored higher than SPs. However, survey results revealed that students' ratings of SPs exceeded those of LLMs in terms of examination difficulty and role-play assessment. SPs were found to be less effective than LLMs in students' psychological experiences and no significant differences in process experiences. LLMs and SPs each have unique strengths, making LLMs a valuable supplement to, rather than a replacement for, SPs. The advantage of LLMs lies in their ability to conduct simulated consultations anytime and anywhere, helping students feel more relaxed and confident. In contrast, students rated SPs higher in terms of examination difficulty and role-play assessment. This feedback underscores the nuanced and complex role that SPs play in medical education, aspects that current LLMs may not fully replicate. Future research should investigate the role-playing capabilities of LLMs as SPs across multiple languages and explore methods to enhance their performance, such as supervised fine-tuning and continued pre-training. Additionally, developing embodied agents that leverage LLMs represents a significant advancement in medical education methodologies.

Weipeng Han, Xiaohong Lyu, and Ji-Jiang Yang contributed equally to this work, including results interpretation and manuscript preparation, and share co-first authorship. Weipeng Han contributed significantly to the conceptualisation and design of the experiments and methodology. Xiaohong Lyu was instrumental in data collection and analysis. Ji-Jiang Yang was pivotal in devising the software design and bringing it to fruition. Shi Chen, Jiming Zhu, and Xiaoming Huang are credited as co-corresponding authors for their joint supervision of the entire research process, and collaborative leadership. Mengsha Yan performed critical review and editing of the manuscript while providing essential research resources. Yuelun Zhang conducted formal data analysis and contributed to manuscript refinement through editorial review. Tingyan Wang managed data curation, developed software tools, and participated in editorial revisions of the paper. Hui Pan oversaw research supervision and provided expert guidance during the manuscript's review and editing phases.

Ethics approval for the study was obtained through the Peking Union Medical College Hospital Ethics Committee (K4899).

查看原文本刊更多论文

大型语言模型能取代标准化的病人吗？

标准化病人（SPs）在医学教育中发挥着至关重要的作用，使学生能够在无风险的环境中实践诊断技能。这不仅增强了他们的信心，也为他们提供了即时的反馈。然而，尽管SPs在培训医疗专业人员方面具有重要意义，但在发展中地区的教育系统中部署和整合SPs面临着重大障碍其中包括培训费用高、医学教育水平不一以及社会文化差异。大型语言模型（llm）的出现进一步催化了医学教育的变革。在医疗资源有限的地区，评估llm作为SPs替代品的有效性和可靠性尤为重要。为了评估llm作为SPs的可行性，我们设计了一项研究，其中llm被提示模拟SPs。该过程包括将临床学生与人类SP接触的视频记录转录成文本，从而形成一个包含6600个问题和答案的数据集。对开源和闭源llm进行测试，由独立专家临床医师采用盲法对其性能进行评价。根据评估结果，我们开发了一个由最有能力的LLM提供支持的教学系统。所有参与者使用重复测量设计完成了两次顺序管理的标准化临床检查，首先是人类SPs，然后是llm模拟SPs。通过专家咨询和小组讨论制定的问卷调查用于评估学生的体验，重点关注考试难度，心理感受和角色扮演的有效性。利用法学硕士在SPs中的作用已经引起了教育者和学习者的极大兴趣和热情。此外，这种方法强调了人工智能在重塑医学教育格局方面的巨大潜力。sp的专业知识和可用性是一种宝贵的资源，llm的整合可以扩大基于sp的教学资源的范围。目前，法学硕士被有效地利用来增加sp的教学方法。他们为实习医生提供实习前和实习后的复习课程，从而增加了可供学生使用的培训机会。盲测结果显示，2名llm得分高于sp。然而，调查结果显示，学生对sp的评分在考试难度和角色扮演评估方面高于llm。sp在学生心理体验方面的效果低于llm，在过程体验方面无显著差异。llm和sp各有其独特的优势，使llm成为sp的宝贵补充，而不是替代。llm的优势在于他们可以随时随地进行模拟咨询，让学生感到更加放松和自信。相比之下，学生在考试难度和角色扮演评估方面对SPs的评分更高。这些反馈强调了SPs在医学教育中扮演的微妙而复杂的角色，这是当前llm可能无法完全复制的方面。未来的研究应该调查llm作为多种语言的sp的角色扮演能力，并探索提高其性能的方法，如监督微调和持续的预训练。此外，开发利用法学硕士的具身代理代表了医学教育方法的重大进步。韩伟鹏、吕晓红和杨吉江对这项工作做出了同样的贡献，包括结果解释和手稿准备，并共享共同第一作者。韩伟鹏对实验和方法的概念化和设计做出了重大贡献。吕晓红在数据收集和分析方面发挥了重要作用。杨继江是软件设计和实现的关键人物。陈石、朱继明和黄晓明因其对整个研究过程的共同监督和协作领导而被认为是共同通讯作者。严孟莎在提供必要的研究资源的同时，对稿件进行了批判性的审查和编辑。张岳伦进行了正式的数据分析，并通过编辑审查对稿件进行了改进。王廷艳负责数据整理，开发软件工具，并参与论文的编辑修改。在稿件审稿和编辑阶段，潘慧负责研究监督，并提供专家指导。本研究通过北京协和医院伦理委员会（K4899）获得伦理批准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical Education 医学-卫生保健

CiteScore

8.40

自引率

10.00%

发文量

279

审稿时长

4-8 weeks

期刊介绍： Medical Education seeks to be the pre-eminent journal in the field of education for health care professionals, and publishes material of the highest quality, reflecting world wide or provocative issues and perspectives. The journal welcomes high quality papers on all aspects of health professional education including; -undergraduate education -postgraduate training -continuing professional development -interprofessional education