{"title":"Can large language models replace standardised patients?","authors":"Weipeng Han, Xiaohong Lyu, Ji-Jiang Yang, Mengsha Yan, Yuelun Zhang, Tingyan Wang, Hui Pan, Shi Chen, Jiming Zhu, Xiaoming Huang","doi":"10.1111/medu.15641","DOIUrl":null,"url":null,"abstract":"<p>Standardised patients (SPs) play a crucial role in medical education by allowing students to practice diagnostic skills in a risk-free environment. This not only boosts their confidence but also provides them with immediate feedback. However, despite their importance in training medical professionals, the deployment and integration of SPs into educational systems in developing regions face significant obstacles.<span><sup>1</sup></span> These include the high cost of training, varying levels of medical education and socio-cultural differences. The emergence of large language models (LLMs) has further catalysed transformations in medical education. Evaluating the effectiveness and reliability of LLMs as substitutes for SPs is especially important in regions with limited medical resources.</p><p>To evaluate the viability of LLMs as SPs, we designed a study where LLMs were prompted to simulate SPs. The process involved transcribing video recordings of clinical student encounters with a human SP into text, which resulted in a dataset of 6600 questions and answers. Open-source and closed-source LLMs were tested, and their performance was evaluated by independent expert clinical physicians in a blinded manner. Based on the evaluations, we developed a teaching system powered by the most capable LLM. All participants completed two sequentially administered standardised clinical examinations using a repeated-measures design, first with human SPs and then with LLM-simulated SPs. A questionnaire survey, developed through expert consultation and group discussions, was used to assess students' experiences, focusing on exam difficulty, psychological feelings and the effectiveness of role-play.</p><p>Utilising LLMs in the role of SPs has generated significant interest and enthusiasm among educators and learners. Additionally, this approach has underscored the vast potential of artificial intelligence in reshaping the landscape of medical education. The expertise and availability of SPs represent a precious resource, and the integration of LLMs can enhance the scope of SP-based instructional resources.</p><p>Currently, LLMs are effectively utilised to augment the instructional approach of SPs. They facilitate both pre- and post-practice review sessions with SPs, thereby enhancing the number of training instances available to students. The blind test indicated that two LLMs scored higher than SPs. However, survey results revealed that students' ratings of SPs exceeded those of LLMs in terms of examination difficulty and role-play assessment. SPs were found to be less effective than LLMs in students' psychological experiences and no significant differences in process experiences. LLMs and SPs each have unique strengths, making LLMs a valuable supplement to, rather than a replacement for, SPs. The advantage of LLMs lies in their ability to conduct simulated consultations anytime and anywhere, helping students feel more relaxed and confident. In contrast, students rated SPs higher in terms of examination difficulty and role-play assessment. This feedback underscores the nuanced and complex role that SPs play in medical education, aspects that current LLMs may not fully replicate. Future research should investigate the role-playing capabilities of LLMs as SPs across multiple languages and explore methods to enhance their performance, such as supervised fine-tuning and continued pre-training. Additionally, developing embodied agents that leverage LLMs represents a significant advancement in medical education methodologies.</p><p>Weipeng Han, Xiaohong Lyu, and Ji-Jiang Yang contributed equally to this work, including results interpretation and manuscript preparation, and share co-first authorship. Weipeng Han contributed significantly to the conceptualisation and design of the experiments and methodology. Xiaohong Lyu was instrumental in data collection and analysis. Ji-Jiang Yang was pivotal in devising the software design and bringing it to fruition. Shi Chen, Jiming Zhu, and Xiaoming Huang are credited as co-corresponding authors for their joint supervision of the entire research process, and collaborative leadership. Mengsha Yan performed critical review and editing of the manuscript while providing essential research resources. Yuelun Zhang conducted formal data analysis and contributed to manuscript refinement through editorial review. Tingyan Wang managed data curation, developed software tools, and participated in editorial revisions of the paper. Hui Pan oversaw research supervision and provided expert guidance during the manuscript's review and editing phases.</p><p>Ethics approval for the study was obtained through the Peking Union Medical College Hospital Ethics Committee (K4899).</p>","PeriodicalId":18370,"journal":{"name":"Medical Education","volume":"59 5","pages":"552-553"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/medu.15641","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Education","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/medu.15641","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Standardised patients (SPs) play a crucial role in medical education by allowing students to practice diagnostic skills in a risk-free environment. This not only boosts their confidence but also provides them with immediate feedback. However, despite their importance in training medical professionals, the deployment and integration of SPs into educational systems in developing regions face significant obstacles.1 These include the high cost of training, varying levels of medical education and socio-cultural differences. The emergence of large language models (LLMs) has further catalysed transformations in medical education. Evaluating the effectiveness and reliability of LLMs as substitutes for SPs is especially important in regions with limited medical resources.
To evaluate the viability of LLMs as SPs, we designed a study where LLMs were prompted to simulate SPs. The process involved transcribing video recordings of clinical student encounters with a human SP into text, which resulted in a dataset of 6600 questions and answers. Open-source and closed-source LLMs were tested, and their performance was evaluated by independent expert clinical physicians in a blinded manner. Based on the evaluations, we developed a teaching system powered by the most capable LLM. All participants completed two sequentially administered standardised clinical examinations using a repeated-measures design, first with human SPs and then with LLM-simulated SPs. A questionnaire survey, developed through expert consultation and group discussions, was used to assess students' experiences, focusing on exam difficulty, psychological feelings and the effectiveness of role-play.
Utilising LLMs in the role of SPs has generated significant interest and enthusiasm among educators and learners. Additionally, this approach has underscored the vast potential of artificial intelligence in reshaping the landscape of medical education. The expertise and availability of SPs represent a precious resource, and the integration of LLMs can enhance the scope of SP-based instructional resources.
Currently, LLMs are effectively utilised to augment the instructional approach of SPs. They facilitate both pre- and post-practice review sessions with SPs, thereby enhancing the number of training instances available to students. The blind test indicated that two LLMs scored higher than SPs. However, survey results revealed that students' ratings of SPs exceeded those of LLMs in terms of examination difficulty and role-play assessment. SPs were found to be less effective than LLMs in students' psychological experiences and no significant differences in process experiences. LLMs and SPs each have unique strengths, making LLMs a valuable supplement to, rather than a replacement for, SPs. The advantage of LLMs lies in their ability to conduct simulated consultations anytime and anywhere, helping students feel more relaxed and confident. In contrast, students rated SPs higher in terms of examination difficulty and role-play assessment. This feedback underscores the nuanced and complex role that SPs play in medical education, aspects that current LLMs may not fully replicate. Future research should investigate the role-playing capabilities of LLMs as SPs across multiple languages and explore methods to enhance their performance, such as supervised fine-tuning and continued pre-training. Additionally, developing embodied agents that leverage LLMs represents a significant advancement in medical education methodologies.
Weipeng Han, Xiaohong Lyu, and Ji-Jiang Yang contributed equally to this work, including results interpretation and manuscript preparation, and share co-first authorship. Weipeng Han contributed significantly to the conceptualisation and design of the experiments and methodology. Xiaohong Lyu was instrumental in data collection and analysis. Ji-Jiang Yang was pivotal in devising the software design and bringing it to fruition. Shi Chen, Jiming Zhu, and Xiaoming Huang are credited as co-corresponding authors for their joint supervision of the entire research process, and collaborative leadership. Mengsha Yan performed critical review and editing of the manuscript while providing essential research resources. Yuelun Zhang conducted formal data analysis and contributed to manuscript refinement through editorial review. Tingyan Wang managed data curation, developed software tools, and participated in editorial revisions of the paper. Hui Pan oversaw research supervision and provided expert guidance during the manuscript's review and editing phases.
Ethics approval for the study was obtained through the Peking Union Medical College Hospital Ethics Committee (K4899).
期刊介绍:
Medical Education seeks to be the pre-eminent journal in the field of education for health care professionals, and publishes material of the highest quality, reflecting world wide or provocative issues and perspectives.
The journal welcomes high quality papers on all aspects of health professional education including;
-undergraduate education
-postgraduate training
-continuing professional development
-interprofessional education